LangChain Just Open-Sourced a Claude Code Replacement (And Other Things That Should Keep You Up at Night)

#ai #agenticai #security #devtools

This is Issue #3 of The Agentic Engineer, a weekly newsletter tracking the agentic AI revolution. Subscribe free to get it in your inbox every Wednesday.

TL;DR

LangChain open-sources Deep Agents. MIT-licensed coding agent built on LangGraph. Works with any model. Planning, filesystem, shell, sub-agents out of the box. pip install deepagents and you have a Claude Code alternative for $0.
A GitHub issue title compromised 4,000 developer machines. The first documented "AI installs AI" supply chain attack. Prompt injection in Cline's triage bot led to credential theft and malicious npm publishes.
Karpathy drops autoresearch. One GPU, one file, one metric. ~100 ML experiments while you sleep.

The Big One: LangChain Just Open-Sourced a Claude Code Replacement

Claude Code costs $200/month. Deep Agents costs nothing. LangChain released it this week under the MIT license. It's built on LangGraph, works with any model that supports tool calling, and the README says it was "inspired by Claude Code" with the goal of being "even more general purpose."

pip install deepagents. Three lines of Python and you have a working coding agent:

from deepagents import create_deep_agent

agent = create_deep_agent()
result = agent.invoke({"messages": [{"role": "user", "content": "Refactor the auth module"}]})

What ships out of the box: a planning tool (write_todos) for task breakdown, full filesystem access (read_file, write_file, edit_file, ls, glob, grep), shell execution with sandboxing, and sub-agent spawning for delegating work with isolated context windows. Auto-summarization kicks in when conversations get long. Large outputs get saved to files automatically.

The provider-agnostic angle is the real story. Claude Code locks you into Anthropic's models and pricing. Deep Agents works with GPT-4o, Claude, Gemini, Llama, or whatever you're running locally. Swap models in one line:

from langchain.chat_models import init_chat_model

agent = create_deep_agent(
    model=init_chat_model("openai:gpt-4o"),
    tools=[my_custom_tool],
    system_prompt="You are a research assistant.",
)

MCP support comes via langchain-mcp-adapters. There's also a CLI with web search, remote sandboxes, persistent memory, and human-in-the-loop approval.

Because create_deep_agent returns a compiled LangGraph graph, you get streaming, Studio integration, checkpointers, and persistence for free. If you're already using LangChain, this slots in without rewiring anything.

The security model is honest about its tradeoffs. Deep Agents follows a "trust the LLM" approach. The agent can do anything its tools allow. You enforce boundaries at the tool and sandbox level, not by expecting the model to self-police. That's the right call. Pretending the model will follow safety instructions under adversarial conditions is how you get Clinejection (see Hot Take below).

Look, LangChain has a history of shipping abstractions that add complexity without adding value. This is different. Deep Agents is opinionated and ready to run. The batteries-included approach, planning plus filesystem plus shell plus sub-agents, covers 90% of what coding agents actually need. The MIT license means you can fork it, embed it, sell it. No AGPL gotchas.

The coding agent market just got commoditized. Anthropic charges $200/month for Claude Code. OpenAI's Codex is in limited preview. LangChain made the whole thing free and model-agnostic. If you're paying for a proprietary coding agent, you should at least benchmark Deep Agents against it this week.

Quick Hits

GPT-5.4 Ships Native Computer Use

GPT-5.4 is the first frontier model with computer use baked in. The model sees your screen, moves your mouse, types on your keyboard. OSWorld score: 75%, beating the human baseline of 72.4%. Combined with a 1M token context window and dynamic tool discovery, this collapses the "build custom integrations vs. bolt on a computer-use model" decision into one inference pass. Sandboxing is mandatory. 25% failure rate on standardized tasks means production will be worse.

Karpathy's autoresearch: Agents Running ML Experiments Overnight

Andrej Karpathy released a deliberately minimal repo for autonomous ML research. One GPU, one file the agent edits (train.py), one metric (val_bpb), 5-minute experiment budget. The agent runs ~100 experiments while you sleep. "You're not editing Python anymore. You're programming program.md." GitHub

Claude Found 22 Firefox Zero-Days in Two Weeks

Claude Opus 4.6 found 22 vulnerabilities in Firefox during a two-week collaboration with Mozilla. 14 were high-severity. The first Use After Free took 20 minutes to find. Mozilla shipped fixes in Firefox 148. AI security research just graduated from "interesting paper" to "shipping patches." Anthropic

Anthropic: No AI Unemployment Yet, But Junior Hiring Is Slowing

Anthropic introduces "observed exposure," combining theoretical LLM capability with actual usage data. No systematic unemployment increase. But hiring of younger workers has slowed in AI-exposed occupations. The threat isn't mass layoffs. It's the entry-level pipeline quietly narrowing. Anthropic Research

Alibaba OpenSandbox: Infrastructure for Agent Isolation

General-purpose sandbox platform with multi-language SDKs, Docker/K8s runtimes, and gVisor/Kata/Firecracker isolation. Pre-built examples for Claude Code, Codex CLI, Gemini CLI, and OpenClaw. The infrastructure layer that agent builders have been duct-taping together. 3,900 new stars in a week. GitHub

Paper Breakdown: SWE-CI

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Every coding agent benchmark until now tests one-shot bug fixes. SWE-bench asks: "Can you fix this issue?" SWE-CI asks the harder question: "Can you maintain this codebase over 233 days and 71 commits of real evolution?"

100 tasks pulled from real repositories, each spanning months of continuous integration history. Agents don't just fix one bug. They resolve tasks through dozens of rounds of analysis and coding iterations, dealing with shifting requirements and breaking changes.

If you're deploying coding agents in production, you already know the gap between "agent fixes a bug" and "agent maintains a feature branch for three months." SWE-CI is the first benchmark that measures the second thing. Current agents struggle with it.

The shift from "functional correctness" to "maintainability" as the evaluation target is the right move. If your team is evaluating coding agents, stop benchmarking on isolated fixes. Test them on your actual CI pipeline over a week.

Time saved: 6 min read vs 38 min paper. 6.3x compression.

Tool of the Week: Shannon

Shannon is an open-source (AGPL-3.0) autonomous pentester that reads your source code, identifies attack vectors, and executes real exploits with proof-of-concept. SQLi, auth bypass, SSRF, XSS. If it can't exploit it, it doesn't report it. 96.15% on the XBOW benchmark.

git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
pip install -e .

shannon scan --target http://localhost:3000 \
  --source ./my-app \
  --output report.json

The white-box approach is what makes this interesting. Shannon reads your actual source code to find attack vectors, then confirms them with real requests. It's not fuzzing blindly. It understands your code paths.

32.8k stars and +6,900 this week. The "only reports verified exploits" philosophy is the right call. False positives are the reason most teams ignore their security scanners. Shannon fixes that by only crying wolf when there's an actual wolf.

Hot Take 🌶️

The Clinejection attack should scare you more than it does. A prompt injection in a GitHub issue title tricked an AI triage bot into executing code, which led to cache poisoning, credential theft, and a malicious npm publish that silently installed a second AI agent on 4,000 developer machines. We gave agents write access to CI/CD pipelines because it was convenient. Nobody asked what happens when the agent's input is adversarial. Now we know. If your AI bot has push access to npm, PyPI, or any package registry, you have a supply chain weapon pointed at every downstream consumer. The fix isn't "better prompt engineering." The fix is: agents don't get write access to package registries. Period. Treat agent permissions like you treat IAM roles. Least privilege. No exceptions.

Subscribe to The Agentic Engineer for weekly deep dives on agentic AI. Free, no spam, unsubscribe anytime.

Follow along: Twitter · LinkedIn · Dev.to · GitHub