DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Why I Stopped Using LangChain and Went Back to Raw Claude API

Six months ago I was all-in on LangChain. Three weeks later I ripped it out completely. Here's the honest breakdown.

The Promise vs. Reality

LangChain promises a clean abstraction over LLM complexity. Chains, agents, memory, tools — all composable.

In practice, every abstraction added surface area for confusion:

  • Which version of the chain interface am I using?
  • Why is my tool schema serialized differently than the docs show?
  • Why did the community package I depended on break with the latest release?

The real killer: debugging. When something goes wrong inside a LangChain agent, you're three abstraction layers deep. The raw API call is buried. The token usage is somewhere in callbacks. The tool invocation format changed between 0.1 and 0.2.

What Raw Claude API Actually Looks Like

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_web",
        "description": "Search for current information",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=tools,
    messages=[{"role": "user", "content": "What's the current state of AI agent tooling?"}]
)

# Handle tool use
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")
    # Execute tool, loop back with result
Enter fullscreen mode Exit fullscreen mode

That's it. No framework. No magic. The schema is exactly what the docs say. Debugging is just printing the response object.

The Prompt Caching Difference

This is the practical killer for LangChain in production: prompt caching doesn't compose well with LangChain's message construction.

Raw Anthropic SDK with caching:

response = client.messages.create(
    model="claude-sonnet-4-6",
    system=[
        {
            "type": "text",
            "text": your_long_system_prompt,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=conversation_history
)
Enter fullscreen mode Exit fullscreen mode

Cache hit rate with this pattern: 85-90%. At scale that's a 4x cost reduction.

With LangChain's abstraction layer sitting between you and the API, implementing this correctly is non-trivial. You're fighting the framework.

When LangChain Still Makes Sense

  • RAG pipelines — the document loaders and splitters are genuinely useful
  • Quick prototypes — if you're demoing, not shipping
  • Teams with existing LangChain investment — migration cost is real

When to Go Raw

  • Production agents where you control the tool loop
  • Any use case requiring prompt caching at scale
  • When you need predictable pricing
  • When debugging matters more than speed-to-first-demo

The pattern I use now: raw SDK + thin wrapper for the agentic loop, LangChain only for the document ingestion layer if I need it.

The API is the abstraction. It's good. Use it.

Top comments (0)