Large Language Models feel intelligent in conversation, but their behavior is fundamentally alien to human cognition. They don't think like we do, reason like we do, or understand context like we do. Yet we keep trying to interact with them as if they were human collaborators.
This mismatch leads to fragile, confusing, and often overconfident prompt engineering. We write prompts that work sometimes, fail mysteriously, and leave us scratching our heads about what went wrong.
The solution isn't better prompt templates or more sophisticated techniques. It's thinking more clearly about what LLMs actually are and how they actually work.
This post will walk through mental models that accurately describe how LLMs behave in practice, cognitive pitfalls that distort our reasoning when writing prompts, and practical frameworks for building more reliable AI systems.
Whether you're building AI features into your products or just trying to understand these systems better, this guide will help you avoid the traps that lead to brittle, confusing, or overconfident prompt engineering.
LLMs Don't Think Like We Do
The fundamental problem is that we anthropomorphize LLMs. We treat them like they have intentions, understanding, or agency. But they don't.
An LLM is a statistical pattern matcher trained on vast amounts of text. It predicts the next token based on what came before, nothing more. Yet we keep trying to reason with it as if it were a human collaborator. This fundamental misunderstanding is what Emily Bender and colleagues called the "stochastic parrot" problem in their influential paper.
This mismatch creates fragile prompts that work in some contexts but fail in others, confusing interactions where the model seems to understand but then behaves unexpectedly, overconfidence in the model's capabilities, and inefficient development cycles spent debugging "unpredictable" behavior.
The solution is to develop accurate mental models of how LLMs actually work and to recognize the cognitive pitfalls that lead us astray.
Durable Mental Models for Working with LLMs
The Stochastic Parrot
Core insight: LLMs are sophisticated pattern matchers, not reasoning engines.
At their heart, LLMs predict the next token based on statistical patterns in their training data. They don't "understand" concepts in the way humans do, but they recognize patterns and continue them.
This means ambiguity is problematic because vague prompts can produce unpredictable results. Precision matters, with specific, well-structured prompts working better. Hallucination is inevitable, as models will confidently generate plausible but false information. And context is everything, as the model's behavior depends heavily on the immediate context.
The practical takeaway? Write prompts that are explicit, specific, and leave little room for interpretation. Don't assume the model will "figure out" what you mean.
The Simulator
Core insight: LLMs can simulate roles, workflows, or behaviors based on the context you provide.
When you write a prompt like "Act as a helpful assistant," the model doesn't become an assistant. It samples from patterns in its training data that match how helpful assistants tend to speak and behave. In effect, you're constructing a lightweight simulation that persists for the duration of the prompt.
This is powerful. Specifying a role like developer, analyst, or teacher can shift the model's tone, structure, and output format. Roles help set expectations and improve consistency, especially across multiple turns. However, if you mix conflicting roles or instructions, the simulation can break down or become incoherent.
The practical takeaway? Be intentional about the role you want the model to play. Specify the persona, goals, and constraints clearly.
The Conversational Mirror
Core insight: LLMs echo your tone, ambiguity, and structure.
The model's output quality and style directly reflects the quality and style of your input. If your prompt is vague, the response will be vague. If your prompt is precise, the response will be more precise.
This means input quality maps directly to output quality, following the classic "garbage in, garbage out" principle. The model adopts your communication style through tone matching. Structure matters, with well-structured prompts producing well-structured responses. And each interaction builds on the previous ones for iterative improvement.
The practical takeaway? Write prompts as if you're writing for a very smart but literal colleague. Be clear, specific, and well-structured.
The Prompt is the Program
Core insight: Prompting is declarative, dynamic programming with text as the universal interface.
Your prompt is the "code" that runs on the LLM. It defines the inputs, outputs, constraints, and behavior. Like any program, it needs to be well-structured, modular, and maintainable. This aligns with the broader shift toward what Andrej Karpathy called Software 2.0, where programs are defined not just by explicit logic, but by data, models, and context.
Complex prompts often create layered, role-specific micro-worlds, with each layer having its own role, goals, and constraints. The model navigates these layers to produce appropriate responses. This layering can create sophisticated behavior, but role conflicts can arise when different layers have conflicting goals.
Since most interaction with an LLM happens through text, the prompt acts like its user interface. And just like any UI, clarity, structure, and consistency go a long way. A clear, well-structured prompt makes the model easier to work with and more likely to respond reliably. When your prompts follow consistent patterns, you get more predictable outputs. Over time, you'll spot fedback loops, what works, what breaks, where ambiguity creeps in, and that helps you refine things. This is why modularity matters. Break complex prompts into smaller, reusable parts. Keep track of what you change and why, just like you would with code.
The practical takeaway? Treat prompt engineering like software engineering. Use version control, test the model, document your approach, and design your prompts with the same care you'd use for a user interface.
Cognitive Pitfalls in Prompt Engineering
I've found knowing about the Einstellung Effect and Type III Error, in particular, instrumental in LLM prompting. Knowing about these two biases can help you a lot with structuring your prompts correctly.
Type III error: Solving the Wrong Problem
The pitfall: Framing mistakes lead to elegant failures.
In statistical reasoning, most people are familiar with Type I errors (false positives) and Type II errors (false negatives). Less well known, but just as important in prompt engineering, is the Type III error: solving the wrong problem.
This happens when a prompt is well-crafted and followed precisely by the model, but the output is irrelevant or unhelpful because the underlying task was misunderstood. The model does exactly what it was asked to do, but the prompt was aimed at the wrong goal. The failure is not in the execution, but in the framing.
You might assume the model can reason when it is really just predicting. You might misread what the user needs, focus on the wrong aspect of the workflow, or design an elegant prompt that misses the point entirely. Sometimes, the entire interaction is shaped more by the capabilities of the LLM than by what the broader system or product actually requires.
To avoid this pitfall, start with the actual problem rather than jumping to an solution. Clarify the task intent before writing any prompts. Test your assumptions about the problem at hand and what the model can actually do. Before writing a single token, step back and define the problem clearly. Only then should you start designing the prompt.
Start with the problem. Then design the prompt. Not the other way around.
The Einstellung Effect: Prompt Fixation
The pitfall: Reuse bias leads to suboptimal prompts.
The Einstellung effect is a cognitive bias where a familiar solution blocks recognition of a better one. In the context of prompt engineering, this often appears when a previously successful prompt becomes a default, even when it no longer fits the task.
Common mistakes include reusing the same prompt structure for different problems, not experimenting with different approaches, getting stuck in familiar patterns, or ignoring evidence that suggests a different approach.
To avoid this pitfall, stay flexible and iterative. Experiment with different prompt structures. Question your assumptions regularly. And look for evidence that suggests different approaches might work better.
Prompt Overfitting
The pitfall: A prompt that works on one example might fail everywhere else.
Prompt overfitting happens when you design and test a prompt using only one or two inputs, then assume it will behave the same way in general. It feels like success, but it's often an illusion. The prompt may have worked because the input was easy, familiar, or unintentionally aligned with the model's defaults. Once the input shifts, longer content, edge cases, different tone, the output breaks down or drifts unpredictably.
This is especially risky when you're working with limited test data or optimizing prompts by trial and error on a small set of examples. You end up with something that looks robust, but actually performs well only in narrow conditions.
To avoid this, evaluate prompts across a wide range of realistic inputs. Include examples that vary in structure, tone, length, or ambiguity. Treat your prompt like a function, it should be predictable, consistent, and stable under real-world conditions, not just the happy path.
The Fluency Illusion
The pitfall: Coherent ≠ correct.
LLMs are excellent at producing coherent, plausible-sounding text. But coherence doesn't guarantee accuracy, especially for factual or logical tasks. This is what researchers call the "fluency illusion", where models generate convincing but incorrect information.
Common mistakes include trusting the model's confidence, not fact-checking important information, assuming coherence means correctness, or not verifying logical consistency.
To avoid this pitfall, always verify important factual claims. Check for logical consistency. Don't trust the model's confidence level. And use the model for what it's good at (generation) while verifying separately.
Anthropomorphic Drift
The pitfall: Attributing agency where there is none.
It's easy to start thinking of the LLM as having intentions, understanding, or agency. When it comes to LLMs and AI, it's a particularly dangerous cognitive trap because the outputs sound human, even though they result from statistical inference. This is a natural response to fluent language, but it leads to overtrusting the model's output or falsely assuming it has reasoning abilities. This anthropomorphic bias is well-documented in human-AI interaction research.
Common mistakes include thinking the model "understands" your intent, attributing reasoning to statistical pattern matching, trusting the model's "judgment," or treating the model like a human collaborator.
To avoid this pitfall, remember that the model is a statistical pattern matcher. Don't attribute understanding or agency. Verify important outputs independently. And keep the model's limitations in mind.
Prompting as Debugging
The pitfall: Not treating prompt engineering as an experimental, iterative process.
Prompt engineering is more like debugging than traditional programming. It requires hypothesizing, testing, and refining based on results. This iterative approach is essential for building reliable AI systems.
Common mistakes include expecting prompts to work perfectly on the first try, not iterating based on results, not isolating variables when testing, or giving up too quickly when things don't work.
To avoid this pitfall, treat prompting as iterative and experimental. Hypothesize, isolate, and refine systematically. Test one change at a time. And learn from failures and unexpected results.
Conclusion: Mental Clarity Beats Prompt Tinkering
The most important skill in working with LLMs isn't knowing the latest prompt techniques or having the most sophisticated templates. It's thinking clearly about what these systems actually are and how they actually work.
When you understand that LLMs are statistical pattern matchers, not reasoning engines, you write different prompts. When you recognize that coherence doesn't equal correctness, you build different systems. When you see prompting as declarative programming, you approach it with different tools and practices.
The clearer you think about LLMs, the more reliable your outcomes will be. Mental models help you reason about these systems. Prompts are interfaces, not dialogues. And the better you understand the cognitive pitfalls that distort your thinking, the more effectively you can avoid them.
The future of AI development isn't about writing better prompts; it's about thinking more clearly about what we're actually building and how these systems actually work.
Top comments (1)
This is a very well written and useful guide on how to improve promting, thank you very much.