Six months ago I was all-in on LangChain. Three weeks later I ripped it out completely. Here's the honest breakdown.
The Promise vs. Reality
LangChain promises a clean abstraction over LLM complexity. Chains, agents, memory, tools — all composable.
In practice, every abstraction added surface area for confusion:
- Which version of the chain interface am I using?
- Why is my tool schema serialized differently than the docs show?
- Why did the community package I depended on break with the latest release?
The real killer: debugging. When something goes wrong inside a LangChain agent, you're three abstraction layers deep. The raw API call is buried. The token usage is somewhere in callbacks. The tool invocation format changed between 0.1 and 0.2.
What Raw Claude API Actually Looks Like
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "search_web",
"description": "Search for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=tools,
messages=[{"role": "user", "content": "What's the current state of AI agent tooling?"}]
)
# Handle tool use
if response.stop_reason == "tool_use":
tool_call = next(b for b in response.content if b.type == "tool_use")
# Execute tool, loop back with result
That's it. No framework. No magic. The schema is exactly what the docs say. Debugging is just printing the response object.
The Prompt Caching Difference
This is the practical killer for LangChain in production: prompt caching doesn't compose well with LangChain's message construction.
Raw Anthropic SDK with caching:
response = client.messages.create(
model="claude-sonnet-4-6",
system=[
{
"type": "text",
"text": your_long_system_prompt,
"cache_control": {"type": "ephemeral"}
}
],
messages=conversation_history
)
Cache hit rate with this pattern: 85-90%. At scale that's a 4x cost reduction.
With LangChain's abstraction layer sitting between you and the API, implementing this correctly is non-trivial. You're fighting the framework.
When LangChain Still Makes Sense
- RAG pipelines — the document loaders and splitters are genuinely useful
- Quick prototypes — if you're demoing, not shipping
- Teams with existing LangChain investment — migration cost is real
When to Go Raw
- Production agents where you control the tool loop
- Any use case requiring prompt caching at scale
- When you need predictable pricing
- When debugging matters more than speed-to-first-demo
The pattern I use now: raw SDK + thin wrapper for the agentic loop, LangChain only for the document ingestion layer if I need it.
The API is the abstraction. It's good. Use it.
Top comments (0)