LangChain was the right answer in 2023. It abstracted away a messy ecosystem of half-baked provider APIs, gave you a unified LLM interface, and let you stitch agents together with a few dozen lines of Python. We used it everywhere — including in production on Vettio, our AI recruitment platform.
In April 2026, we ripped it out.
This post is about why we made that call, what replaced it, and the metrics that justified the migration.
The symptoms
LangChain's abstractions started leaking the moment we went beyond happy-path demos. Three things kept biting us:
-
Stack traces from hell. A single
AgentExecutor.invoke()call crossed 14 frames of LangChain internals before reaching our code. Debugging a malformed tool call felt like archaeology. - Version churn. Every minor bump renamed, relocated, or deprecated something we depended on. Our CI was pinned to a specific LangChain SHA for six months just to stay green.
- Abstracted-away observability. We couldn't cleanly trace token usage, cache hits, or per-tool latencies without monkey-patching internal classes.
Meanwhile, Anthropic's native SDK was getting better. Native tool calling, prompt caching, extended thinking, streaming — all first-class and documented.
The refactor
The logic we were using LangChain for wasn't complicated:
- Build a system prompt from templates
- Call Claude with a list of tools
- Route tool calls to our internal handlers
- Return the result
We replaced ~800 lines of LangChain glue with this:
from anthropic import Anthropic
client = Anthropic()
def run_agent(user_input: str, tools: list[dict], tool_handlers: dict):
messages = [{"role": "user", "content": user_input}]
while True:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=tools,
messages=messages,
system=SYSTEM_PROMPT,
)
if response.stop_reason == "end_turn":
return response.content[0].text
# Handle tool use
tool_calls = [b for b in response.content if b.type == "tool_use"]
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for call in tool_calls:
result = tool_handlers[call.name](**call.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": call.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
That's it. No AgentExecutor, no Callback, no ConversationBufferMemory. Just the model and our code.
The metrics
We ran the old and new paths side-by-side for two weeks on Vettio's interview-bot service. Results:
- p50 latency: 2.1s → 1.4s (−33%)
- p95 latency: 4.8s → 3.2s (−33%)
- Error rate: 0.9% → 0.2%
- Stack trace depth on errors: 14 → 4 frames
- Lines of integration code: 812 → 187
The latency win came mostly from eliminating LangChain's implicit retry-and-retry-again behavior on tool-use mismatches. With direct SDK calls, a malformed tool schema fails loudly instead of silently retrying three times.
When LangChain still makes sense
This isn't a blanket "don't use LangChain" post. It still wins if you need:
- Multi-provider abstraction. Swapping between Claude, GPT-4, and Gemini behind a stable interface.
- LangGraph workflows for graph-based agent topologies you'd otherwise build from scratch.
- LangSmith observability you don't want to rebuild.
For a team that's already committed to one provider (we're all-in on Claude) and wants full control over prompts, tool schemas, and observability — the native SDK is the right tool in 2026.
The lesson
Abstractions pay for themselves when the underlying APIs are bad. Anthropic's API isn't bad. It's clean, well-documented, and stable. The abstraction tax was real; the abstraction benefit had quietly evaporated.
If you're still on LangChain in a production Claude app, benchmark a direct-SDK rewrite of your hot path. You might be surprised.
Top comments (0)