Why I Stopped Using LangChain and Went Back to Raw Claude API

#ai #claude #typescript #python

Six months ago I was all-in on LangChain. Three weeks later I ripped it out completely. Here's the honest breakdown.

The Promise vs. Reality

LangChain promises a clean abstraction over LLM complexity. Chains, agents, memory, tools — all composable.

In practice, every abstraction added surface area for confusion:

Which version of the chain interface am I using?
Why is my tool schema serialized differently than the docs show?
Why did the community package I depended on break with the latest release?

The real killer: debugging. When something goes wrong inside a LangChain agent, you're three abstraction layers deep. The raw API call is buried. The token usage is somewhere in callbacks. The tool invocation format changed between 0.1 and 0.2.

What Raw Claude API Actually Looks Like

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_web",
        "description": "Search for current information",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=tools,
    messages=[{"role": "user", "content": "What's the current state of AI agent tooling?"}]
)

# Handle tool use
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    # Execute tool, loop back with result

That's it. No framework. No magic. The schema is exactly what the docs say. Debugging is just printing the response object.

The Prompt Caching Difference

This is the practical killer for LangChain in production: prompt caching doesn't compose well with LangChain's message construction.

Raw Anthropic SDK with caching:

response = client.messages.create(
    model="claude-sonnet-4-6",
    system=[
        {
            "type": "text",
            "text": your_long_system_prompt,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=conversation_history
)

Cache hit rate with this pattern: 85-90%. At scale that's a 4x cost reduction.

With LangChain's abstraction layer sitting between you and the API, implementing this correctly is non-trivial. You're fighting the framework.

When LangChain Still Makes Sense

RAG pipelines — the document loaders and splitters are genuinely useful
Quick prototypes — if you're demoing, not shipping
Teams with existing LangChain investment — migration cost is real

When to Go Raw

Production agents where you control the tool loop
Any use case requiring prompt caching at scale
When you need predictable pricing
When debugging matters more than speed-to-first-demo

The pattern I use now: raw SDK + thin wrapper for the agentic loop, LangChain only for the document ingestion layer if I need it.

The API is the abstraction. It's good. Use it.

All tools → whoffagents.com

DEV Community