DEV Community: Leo Han

LangGraph: Engineering Controllable Enterprise Agents

Leo Han — Mon, 15 Jun 2026 10:44:56 +0000

LangGraph: Engineering Controllable Enterprise Agents

1. Why enterprise agents need more than a single LLM call

In early prototypes, an AI application may look like a simple prompt-response loop. A user asks a question, the model returns an answer. In production, this pattern quickly reaches its limits.

LLMs do not automatically know real-time business data, internal database records, or operational context. They also do not reliably execute long-running workflows by themselves. On the other hand, fully autonomous agents can become unpredictable: they may loop, call the wrong tool, produce hallucinated decisions, or perform unsafe actions.

Enterprise AI needs an orchestration layer that gives models controlled autonomy. LangGraph provides this layer by modeling agent workflows as graphs with explicit state, nodes, edges, persistence, and human oversight.

2. From chains to graphs

A chain is a fixed sequence:

Start -> Step 1 -> Step 2 -> Step 3 -> End

This is reliable but rigid. A graph is more expressive:

Start
  -> Agent Node
  -> Tool Node
  -> Agent Node
  -> Human Review Node
  -> End

With LangGraph, the next step can be selected dynamically based on the current state. This turns an agent from a linear script into a controlled state machine.

3. The three core concepts

LangGraph workflows are built from three primitives.

State is the shared data structure of the workflow. It may contain messages, user context, task IDs, tool results, approval status, risk level, retry count, and final outputs.

Node is a unit of work. A node can call an LLM, execute a tool, validate a rule, retrieve documents, wait for human approval, or format a result.

Edge controls what happens next. Normal edges represent fixed transitions. Conditional edges route execution based on state.

4. A production-oriented architecture

A practical enterprise agent can be structured as:

User Request
  -> Input Validation
  -> Intent Router
  -> Agent Reasoning
  -> Tool Selection
  -> Tool Execution
  -> Result Normalization
  -> Risk Check
  -> Human Review, optional
  -> Final Response
  -> Audit Log / Metrics

This separates model reasoning from operational control. The LLM interprets and plans. Tool nodes access external systems. Risk nodes enforce policies. Human review nodes approve high-risk actions. State and checkpoints make the workflow recoverable and auditable.

5. Tool results should go back to the agent

A common mistake is to return raw tool output directly to the user. Tool outputs are often JSON payloads, database rows, API responses, or error codes. The better pattern is:

Agent decides a tool is needed
  -> Tool executes
  -> Tool result is written to State
  -> Agent reads State again
  -> Agent produces a business-readable answer

This keeps the model responsible for explaining tool results in context.

6. Persistence and checkpoints

Production agents cannot assume every task finishes in a single request. Workflows may pause for approval, fail due to external systems, or resume after service restarts.

Checkpoints allow the graph state to be persisted and resumed. This enables long-running workflows, human approval flows, failure recovery, and detailed audit trails.

7. Human-in-the-loop

Human oversight is not a weakness. It is what makes high-impact AI automation deployable.

Human review is recommended for irreversible operations, low-confidence decisions, compliance-sensitive actions, tool parameter changes, and conflicts between model plans and business rules.

In graph form:

risk_check
  -> low_risk: execute_tool
  -> high_risk: human_review
human_review
  -> approved: execute_tool
  -> edited: execute_tool_with_new_args
  -> rejected: final_reject_response

8. Self-correction loops

Graphs can express review-and-retry patterns:

generate_plan
  -> review_plan
  -> if pass: execute_plan
  -> if fail: generate_plan

This is useful for code generation, SQL generation, document writing, compliance review, and RAG answer validation. Production systems must set loop limits, cost limits, timeout limits, and fallback behavior.

9. Adoption roadmap

Start by converting an existing prompt feature into a graph. Then add read-only tools. Next, introduce checkpointing and thread-level memory. After that, add write operations behind human approval. Finally, standardize common capabilities such as tool registries, state schemas, approval components, tracing, regression tests, and evaluation datasets.

10. Final takeaway

LangGraph is not about giving agents unlimited freedom. It is about giving them structured freedom. For engineering teams, the real shift is from prompt engineering to state-machine engineering, workflow engineering, and runtime engineering.

References

LangGraph overview: https://docs.langchain.com/oss/python/langgraph/overview
LangGraph Graph API: https://docs.langchain.com/oss/python/langgraph/graph-api
LangGraph persistence: https://docs.langchain.com/oss/python/langgraph/persistence
LangChain overview: https://docs.langchain.com/oss/python/langchain/overview

LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide

Leo Han — Mon, 15 Jun 2026 10:44:37 +0000

LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide

1. The role of LangChain in enterprise AI

If a model API is the engine, LangChain is the framework that helps engineering teams install that engine into real applications. It provides a standard model interface, agent construction, tool wrapping, message objects, memory, middleware, and observability integrations.

The key point is that LangChain is not only about calling LLMs. It helps teams shape model behavior, extend model capabilities, and place LLMs inside testable and debuggable application structures.

2. Agent = Model + Harness

An enterprise agent is not just an LLM. It needs:

A model interface.
A system prompt.
Tools.
Message objects.
Memory.
Middleware.
Observability.

The model is the reasoning core. The harness around it determines whether the agent can be safely used in production.

3. Why agents need tools

Without tools, a model can mainly answer based on its training data and provided context. With tools, it can access real business systems.

Typical tools include current-time lookup, database queries, file search, business APIs, order lookup, ticket operations, approval triggers, and controlled code execution.

Tools are what turn an AI assistant from a chatbot into a business workflow participant.

4. Engineering rules for tools

Tools should be small, typed, well-described, and auditable. Avoid large generic tools such as handle_customer_issue(anything). Prefer explicit tools such as get_order_status(order_id) or create_refund_request(order_id, reason).

Tool outputs should be structured. Write operations should have audit logs, idempotency keys, permission checks, and human approval when needed.

5. Standard model interface

LangChain helps teams integrate and switch among different model providers. Model configuration usually includes model name, temperature, max tokens, timeout, retry policy, and API credentials.

For enterprise use, it is better to place a model gateway or model configuration service above individual agents. This enables cost control, rate limiting, fallback, provider switching, and audit.

6. Messages: prompt is not a single string

In production, a prompt is a set of messages:

System messages.
Human messages.
AI messages.
Tool messages.
Summaries of previous context.

Many production issues are caused by poor message construction: conflicting instructions, missing tool results, overly long history, broken tool-call ordering, or missing thread isolation.

7. Short-term memory

Short-term memory means preserving context within a conversation thread. With a checkpointer and a thread_id, an agent can remember previous interactions in the same thread.

This is useful for customer support, task execution, form filling, coding assistants, and multi-step workflows. However, long histories increase cost and may distract the model, so teams need trimming, deletion, summarization, and filtering strategies.

8. Long-term memory

Long-term memory stores information across sessions, such as user preferences, recurring constraints, historical task summaries, and organization knowledge.

It usually requires embeddings, a vector or structured store, retrieval logic, write policies, and deletion policies. Sensitive data must be governed carefully. Not everything should be remembered.

9. Middleware

Middleware is where teams should place cross-cutting concerns:

Prompt-injection checks.
PII redaction.
Message summarization.
Token budget control.
Tool filtering.
Simulated tool calls for testing.
Human approval.
Risk scoring.
Output compliance checks.
Audit logging.

Middleware keeps business-specific agent code simpler and makes platform capabilities reusable.

10. Observability and LangSmith

A production agent must be traceable. Teams need to know which model was used, which messages were sent, why a tool was selected, what arguments were passed, what the tool returned, which middleware changed the state, where latency came from, and how the run can be replayed.

LangSmith or an equivalent observability platform turns agents from black boxes into auditable systems.

11. Recommended enterprise architecture

Frontend / API
  -> Auth & Tenant Context
  -> Agent Gateway
  -> LangChain Agent
      -> Model Adapter
      -> Tool Registry
      -> Memory Layer
      -> Middleware Stack
      -> Human Approval
  -> Business Systems
  -> Observability / Audit

This architecture separates authentication, model access, tool permissions, memory, safety controls, and monitoring.

12. Adoption guidance

Start with read-only tools. Add write tools only after tool selection, parameter validation, tracing, and error handling are stable. For high-risk actions, require human approval first and automate gradually based on operational data.

Treat prompts as code: version them, test them, review them, and make them rollback-friendly. Build evaluation datasets that cover normal cases, edge cases, malicious inputs, tool failures, memory behavior, latency, and cost.

13. Final takeaway

LangChain turns model capability into engineering capability. The right goal is not to build one-off chatbots, but to build a reusable agent engineering foundation: standard model access, a tool registry, managed memory, reusable middleware, and end-to-end observability.

References

LangChain overview: https://docs.langchain.com/oss/python/langchain/overview
LangChain agents: https://docs.langchain.com/oss/python/langchain/agents
LangChain tools: https://docs.langchain.com/oss/python/langchain/tools
LangChain short-term memory: https://docs.langchain.com/oss/python/langchain/short-term-memory
LangChain long-term memory: https://docs.langchain.com/oss/python/langchain/long-term-memory
LangChain middleware: https://docs.langchain.com/oss/python/langchain/middleware/overview

LangChain-Core-Components-Guide

Leo Han — Fri, 12 Jun 2026 15:52:07 +0000

LangChain Architect's Guide: Building LLM Applications from First Principles

Author's Note: This guide is based on a 9-episode LangChain tutorial series (~5 hours total). Every slide, code demo, and architecture diagram from the videos has been analyzed frame by frame, validated against the latest LangChain API, and rewritten as a comprehensive technical reference.

Introduction: Why LangChain Matters
Episode 1: LangChain Overview & the LLM Landscape
Episode 2: Hello World & ConversationChain
Episode 3: Model I/O — Prompt Engineering at Scale
Episode 4: Data Connection — Teaching LLMs to Read Your Data
Episode 5: Chains — The Art of Orchestration
Episode 6: Agents — Autonomous LLM Reasoning
Episode 7: Hands-on PDF Q&A System
Episode 8: Hands-on Advanced Search Agent
Episode 9: Retrospective & Best Practices
Appendix: API Migration Guide & Troubleshooting

1. Introduction: Why LangChain Matters

1.1 The LLM Revolution and Its Bottlenecks

In November 2022, OpenAI released the GPT-3.5 API (text-davinci-003), and everything changed. Within months, GPT-4 arrived with its MoE (Mixture of Experts) architecture, Meta open-sourced LLaMA, and Zhipu AI launched ChatGLM. LLMs were no longer just research papers — they were programmable infrastructure.

But building applications directly on raw API calls hits four walls immediately:

Wall 1: Context Constraints. GPT-3.5 maxes out at 4,096 tokens. A 20-page PDF simply doesn't fit.

Wall 2: Capability Boundaries. An LLM is a text predictor, not an agent. It can't search the web, execute code, read files, or call external APIs.

Wall 3: Amnesia by Design. Every API call is a blank slate. State management must be built from scratch.

Wall 4: Prompt Sprawl. Prompts get scattered across dozens of files as raw strings. There's no templating, versioning, or testing.

LangChain was built specifically to tear down these four walls.

1.2 What LangChain Actually Is

LangChain is not a new LLM. It's an orchestration framework that provides standardized abstractions:

User Input → [Prompt Template] → [LLM Call] → [Output Parser] → Result
                      ↑                ↑
                  [Memory]        [Tools / APIs]

The six-layer architecture:

Layer	Module	Problem It Solves
L1	Model I/O	Unified interface across LLM providers
L2	Data Connection	Reading external documents
L3	Chains	Composing multiple LLM calls
L4	Memory	Retaining conversation state
L5	Agents	LLM autonomously decides which tools to use
L6	Callbacks	Monitoring, logging, debugging

1.3 The LLM Landscape

Model	Provider	Architecture	Best For
GPT-4	OpenAI	MoE, 8×220B experts	Complex reasoning
GPT-3.5	OpenAI	175B Dense	Price-performance
LLaMA 2	Meta	7B/13B/70B open-source	Local deployment
ChatGLM	Zhipu AI	Chinese-English bilingual	Chinese scenarios

What is MoE? GPT-4 is a collection of "expert" sub-models. For each inference, only a subset activates — like a dispatch system routing each question to the best-qualified specialists.

2. Episode 1: LangChain Overview & the LLM Landscape

Source: 1.mp4 (~30 min)

2.1 Key Questions

What are LLMs? — From GPT-3 (June 2020) through GPT-3.5 API (November 2022), to GPT-4 and open-source.
What is LangChain? — A Python framework for composing LLM calls like building blocks.
Why use LangChain? — Raw API calls work for demos; products require engineering.

2.2 The Raw API Problem

const response = await createCompletion({
  model: "text-davinci-003",
  prompt: "Who are you?",
  temperature: 0.8,
  max_tokens: 100,
});

Missing: prompt management, context injection, structured output, reliability, observability.

2.3 LangChain's Answer

LangChain is a framework for developing applications powered by language models.

It provides standardized abstractions above the LLM layer so you focus on business logic, not plumbing.

3. Episode 2: Hello World & ConversationChain

Source: 2.mp4 (~48 min)

3.1 Environment Setup

pip install langchain langchain-openai langchain-community python-dotenv

3.2 First LangChain Call

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

load_dotenv()

llm = ChatOpenAI(
    model="deepseek-chat",
    temperature=0.7,
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com/v1"
)

result = llm.invoke([HumanMessage(content="Explain LangChain in one sentence.")])
print(result.content)

Key insight: base_url enables hot-swappable model infrastructure.

3.3 Temperature: The Creativity Knob

temperature = 0.0  →  Deterministic: math, code, factual Q&A
temperature = 0.7  →  Balanced: conversation, summarization
temperature = 1.0  →  Creative: storytelling, brainstorming

Under the hood: temperature modulates the probability distribution over vocabulary.

3.4 Jupyter Notebook for LLM Development

The Cell mechanism is uniquely suited to LLM development — iteratively build prompts, observe outputs, tune parameters.

3.5 ConversationChain: Giving LLMs Memory

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(),
    verbose=True
)

conversation.predict(input="My name is Alice, I'm 25")
conversation.predict(input="What's my name and age?")
# AI: Your name is Alice and you're 25 years old.

3.6 How Prompts Pass Information to LLMs

System Message: Sets behavioral boundaries ("You are a professional Python developer")
Human Message: User's direct input
AI Message: LLM's response — appended as history in multi-turn conversations

4. Episode 3: Model I/O — Prompt Engineering at Scale

Source: 3.mp4 (~31 min)

4.1 The Anti-Pattern: Raw String Concatenation

# Anti-pattern
prompt = "Translate: " + text

4.2 PromptTemplate

from langchain_core.prompts import PromptTemplate

template = PromptTemplate.from_template(
    "You are a {role}. Translate to {target_lang}:\n{text}"
)
prompt_str = template.format(role="translator", target_lang="English", text="Hello")

4.3 Few-Shot Prompting

from langchain_core.prompts import FewShotPromptTemplate

examples = [
    {"input": "happy", "output": "Positive"},
    {"input": "sad",   "output": "Negative"},
]

few_shot = FewShotPromptTemplate(
    examples=examples,
    example_prompt=PromptTemplate.from_template("Input: {input}\nSentiment: {output}"),
    prefix="Classify sentiment:",
    suffix="Input: {input}\nSentiment:",
    input_variables=["input"],
)

Few-Shot vs Fine-Tuning:

Dimension	Few-Shot	Fine-Tuning
Cost	Zero training cost	Training data + GPU
Flexibility	Change instantly	Requires retraining
Effectiveness	Format control	Domain knowledge

Rule of thumb: Start with Few-Shot, fine-tune only when stable.

4.4 Example Selector

from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

selector = SemanticSimilarityExampleSelector.from_examples(
    examples=all_examples,
    embeddings=OpenAIEmbeddings(),
    vectorstore_cls=Chroma,
    k=3,
)

How it works: 1) Embed all examples → 2) Embed user input → 3) Cosine similarity → 4) Inject top-K

4.5 Output Parsers

CommaSeparatedListOutputParser: CSV-style output
StructuredOutputParser: JSON Schema compliance
PydanticOutputParser: Direct Pydantic model parsing (most powerful)

5. Episode 4: Data Connection — Teaching LLMs to Read Your Data

Source: 4.mp4 (~35 min)

This is LangChain's most important module — complete RAG infrastructure.

5.1 The Core Problem

LLMs have a training cutoff. Your proprietary documents are invisible to them. RAG bridges this gap:

Step 1: Load → Step 2: Split → Step 3: Embed → Step 4: Store

5.2 Document Loaders

from langchain_community.document_loaders import (
    PyPDFLoader, WebBaseLoader, YoutubeLoader,
    UnstructuredPowerPointLoader, TextLoader, CSVLoader,
)

loader = PyPDFLoader("report.pdf")
pages = loader.load()
# pages[0].page_content → text
# pages[0].metadata → {"source": "...", "page": 1}

Every loader returns Document with page_content + metadata.

5.3 Text Splitters

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " ", ""]
)
splits = splitter.split_documents(pages)

Why overlap? Prevents cutting sentences at chunk boundaries.

5.4 Word Embeddings: The Math of Meaning

"cat" → [0.12, -0.34, 0.56, ...]
"dog" → [0.14, -0.31, 0.58, ...]  ← close to "cat"
"car" → [-0.78, 0.45, -0.12, ...]  ← far from both

Cosine similarity: cos(θ) = (A·B) / (|A|×|B|) — range [-1, 1]

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vec = embeddings.embed_query("What is RAG?")

Model guide: text-embedding-3-small (1536d, best value), text-embedding-3-large (3072d)

5.5 Vector Stores

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(splits, OpenAIEmbeddings())
results = vectorstore.similarity_search("MoE architecture pros and cons?", k=4)

DB guide: FAISS (dev), Chroma (small projects), Pinecone (production), Weaviate/Qdrant (enterprise)

6. Episode 5: Chains — The Art of Orchestration

Source: 5.mp4 (~25 min)

6.1 What Is a Chain?

# Without Chain:
prompt = template.format(input=x)
response = llm.invoke(prompt)
parsed = parser.parse(response)

# With LCEL:
chain = prompt | llm | parser
result = chain.invoke({"input": x})

6.2 Chain Types

LLMChain: Atomic unit — Prompt + LLM.

RouterChain: Auto-dispatches to specialized handlers:

Input → Router → Math? → MathChain
               → Code? → CodeChain  
               → General? → GeneralChain

SequentialChain: Pipeline processing:

Generate Outline → Expand → Polish → Output

TransformationChain: Post-process output (clean, translate, filter).

6.3 Document Chains (RAG Core)

Four strategies for feeding retrieved chunks to LLM:

Stuff: All chunks in one prompt. Simple, context-limited.
Map-Reduce: Process each chunk independently, then aggregate. Parallel, scalable.
Refine: Iteratively improve answer with each chunk. High quality, sequential.
Map-Rerank: Score each chunk's answer, pick best. For relevance ranking.

7. Episode 6: Agents — Autonomous LLM Reasoning

Source: 6.mp4 (~25 min)

7.1 Chains vs Agents

Chain = passive (defined workflow). Agent = active (chooses tools autonomously).

7.2 ReAct Pattern

Q: Who is Leo DiCaprio's girlfriend? Her age ^ 0.43?

Thought: Find girlfriend first
Action: Search("Leo DiCaprio girlfriend")
Observation: Vittoria Ceretti

Thought: Need her age
Action: Search("Vittoria Ceretti age")  
Observation: 26 years old

Thought: Calculate 26^0.43
Action: Calculator("26^0.43")
Observation: ~4.06

Final Answer: Vittoria Ceretti, 26. 26^0.43 ≈ 4.06

7.3 Agent Implementation

from langchain.agents import load_tools, initialize_agent, AgentType
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="deepseek-chat", temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
    tools, llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True, max_iterations=5,
)
agent.run("Who is Leo DiCaprio's girlfriend? Her age ^ 0.43?")

7.4 Agent Types

Type	Pattern	Best For
Zero-shot ReAct	Decide on the fly	Simple tasks
Structured Chat	Multi-parameter tools	Complex tools
OpenAI Functions	Function Calling API	GPT models
Plan-and-Execute	Plan then execute	Multi-step tasks

7.5 Tuning

max_iterations: Prevent infinite loops
handle_parsing_errors: Retry on malformed output
early_stopping_method: "force" or "generate" best guess

8. Episode 7: Hands-on PDF Q&A System

Source: 7.mp4 (~38 min)

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

loader = PyPDFLoader("report.pdf")
splits = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100).split_documents(loader.load())
vectorstore = FAISS.from_documents(splits, OpenAIEmbeddings(model="text-embedding-3-small"))
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="deepseek-chat", temperature=0),
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True,
)
result = qa.invoke({"query": "What are the key findings?"})
print(result["result"])

chain_type: stuff (<4 chunks), map_reduce (many), refine (sequential), map_rerank (score).

9. Episode 8: Hands-on Advanced Search Agent

Source: 8.mp4 (~53 min)

from langchain.agents import tool, initialize_agent, AgentType

@tool
def get_stock_price(symbol: str) -> str:
    """Get current stock price. Input: ticker like AAPL, TSLA."""
    prices = {"AAPL": "189.30", "TSLA": "242.84"}
    return prices.get(symbol.upper(), f"Symbol {symbol} not found")

agent = initialize_agent([get_stock_price], llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("What's Apple's stock price?")

Key: tool's docstring IS what the Agent reads to decide when to use it.

10. Episode 9: Retrospective & Best Practices

Source: 9.mp4 (~20 min)

Production Checklist

Security: Never expose keys in prompts. Sandbox agent execution.
Performance: Cache embeddings. Persistent vector stores (Chroma/Pinecone).
Cost: Cheap models first. Set max_iterations. Cache deterministic responses.
Observability: LangSmith tracing. Log token consumption.

API Migration (2 Years)

Old	New
`langchain.llms.OpenAI`	`langchain_openai.ChatOpenAI`
`llm.predict("text")`	`llm.invoke([HumanMessage("text")])`
`Chain.run(input)`	`Chain.invoke({"key": value})`

11. Appendix: Troubleshooting

Error	Solution
`ModuleNotFoundError: langchain.llms`	Use `langchain-openai`
`jupyter-lab` not found	Add Scripts to PATH
DeepSeek 401	Check `.env` API key
Agent infinite loop	Improve tool docstrings, set max_iterations
FAISS OOM	Switch to Chroma or Pinecone

Learning Path

Week 1: Hello World → Week 2: Prompts → Week 3: RAG → Week 4: Agent + Tools → Ongoing: Docs

Epilogue

LangChain's core philosophy — modular, composable, engineered — endures. Package names change; architectural patterns don't.

The most valuable thing LangChain offers isn't its code — it's the paradigm of composing LLM applications like building blocks.

Disclaimer: Code adapted for LangChain latest API (2025-2026). Images from original tutorial screenshots for educational reference only.

why-we-dropped-langchain

Leo Han — Tue, 09 Jun 2026 15:09:07 +0000

Why We Dropped LangChain: When Abstractions Do More Harm Than Good

A 12-Month Lesson Learned

In early 2023, we put LangChain into production. In 2024, we removed it entirely.

LangChain seemed like the best choice for building LLM-powered applications in 2023. It had an impressive list of components and tools, its popularity was soaring, and it promised to "enable developers to go from an idea to working code in an afternoon." But as our project progressed, the cracks began to show.

LangChain's inflexibility gradually surfaced: we found ourselves constantly diving into LangChain internals to modify lower-level behavior. And because LangChain intentionally abstracts away those internals, doing so was extremely painful. Whenever we needed to do something the framework didn't natively support, we had to "translate" our requirements into LangChain-appropriate solutions — instead of just writing code.

This post shares the real reasons we abandoned LangChain, and why replacing its rigid high-level abstractions with modular building blocks simplified our codebase, made our team happier, and made us more productive.

The Core Problem: The Perils of Being an Early Framework

LLMs are a rapidly changing field, with new concepts and ideas emerging weekly. When a framework like LangChain is built around multiple emerging technologies, designing abstractions that will stand the test of time is nearly impossible.

Crafting well-designed abstractions is hard — even when the requirements are well-understood and stable. But when you're modeling components in such a state of flux, by the time you finish designing the abstraction, the underlying technology has already changed.

This isn't the LangChain team's fault. Anyone attempting to build such a framework at that point in time wouldn't have done any better. Everyone was doing their best.

Problem 1: Simple Tasks Become Complicated

Consider the simplest possible task: a translation app. Using the native OpenAI SDK:

import os
from openai import OpenAI

os.environ["OPENAI_API_KEY"] = "<your_api_key>"

client = OpenAI()
text = "hello!"
language = "Italian"

messages = [
    {"role": "system", "content": "You are an expert translator"},
    {"role": "user", "content": f"Translate the following from English into {language}"},
    {"role": "user", "content": f"{text}"},
]

response = client.chat.completions.create(model="gpt-4o", messages=messages)

Clean, direct, no hidden logic. Any Python developer can understand it at a glance.

Now the same task with LangChain:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o-turbo", temperature=0)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert translator"),
    ("user", "Translate the following from English into {language}"),
    ("user", "{text}")
])

parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({"language": language, "text": text})

You now need to understand ChatPromptTemplate, StrOutputParser, the pipe operator |, and the invoke method — all LangChain-specific concepts. They don't make your code better; they just make your code more LangChain.

The problem isn't writing a few extra lines. The problem is that every abstraction you introduce adds a layer of cognitive overhead and debugging difficulty. When something breaks, you're not debugging your business logic — you're debugging LangChain's framework code.

Problem 2: The `http.client` vs. `requests` Analogy

Imagine you have a choice:

Option A: Use http.client to make a request

import http.client, json

conn = http.client.HTTPSConnection("api.example.com")
conn.request("GET", "/data")
response = conn.getresponse()
data = json.loads(response.read().decode())
conn.close()

Option B: Use requests to make a request

import requests

response = requests.get("https://api.example.com/data")
data = response.json()

Which is better? Obviously B. requests isn't "too much abstraction" — it's the right level of abstraction. It encapsulates the genuine complexity (connection management, encoding) without hiding what you actually care about (URL, response data).

LangChain's problem is that it's often neither A nor B. It neither simplifies the truly hard parts (complex Agent orchestration) nor leaves the simple parts simple.

A Reddit comment captured it perfectly:

"Code quality isn't great and structure is pretty iffy. Really hate the piping language structure. Docs are way outdated, deprecation warnings implemented poorly. And when you need to dig under the surface to fix something, you see the ugly. But it gets the job done. I can see what they want to do, but it's bloated very quickly, probably because of its popularity."

Problem 3: Agents Become Black Boxes

When we wanted to move from a single sequential Agent architecture to something more complex, LangChain became our biggest obstacle.

We needed to externally observe the Agent's state, dynamically control available tools, and flexibly orchestrate interactions between multiple Agents. But LangChain's Agent abstractions encapsulate all of this behind an opaque surface — it provides no method for externally observing an Agent's state, forcing us to reduce the scope of our implementation to fit into the limited functionality available to LangChain Agents.

In one instance, we needed to dynamically change the availability of tools our Agents could access, based on business logic. In native code, this is an if statement and an append/remove on a list. In LangChain, you're expected to declare tools upfront during the framework's initialization flow, and dynamic modification requires working around layers of encapsulation.

Once we removed LangChain, we no longer had to translate our requirements into LangChain-appropriate solutions. We could just code.

What LangChain's Architecture Diagram Really Shows

LangChain's official architecture reveals its ambition:

        LLMs and Prompts
              |
           CHAINS
              |
          LANGCHAIN
         /          \
  MEMORY              DOCUMENT LOADERS
(Vector DBs)            AND UTILS

The problem: every single module is in constant flux. LLM interfaces change. Prompt best practices evolve. Chain orchestration patterns shift. Memory implementations keep moving. When your framework tries to abstract all of these rapidly changing components at once, the only stable thing is the instability itself.

Do You Really Need a Framework for Building AI Applications?

LangChain's long list of components gives the impression that building LLM-powered applications is complicated and requires a framework. But here's the reality:

LLM Calls: The OpenAI / Anthropic SDKs are already clean enough
Prompt Management: Python f-strings or a Jinja2 template will do
Chains / Orchestration: Pure Python functions and loops, more readable than any DSL
Memory: A dictionary or a database table, under your full control
Vector Stores: Chroma, Pinecone, Qdrant all have clean native APIs
Document Loaders: Mature, independent libraries exist for PDF, web parsing, etc.

LangChain adds one more abstraction layer on top of all of these. And the value of that layer, in most scenarios, falls far short of the complexity it introduces.

Our Alternative: Modular Building Blocks

After dropping LangChain, our tech stack became:

OpenAI / Anthropic SDK — LLM calls
Chroma / Qdrant — Vector storage (using native APIs directly)
Simple custom orchestration — Python functions + type annotations
Standard Python logging and monitoring — no framework-specific callback system

The core principle: every component is replaceable, every abstraction is your own, and there are no black boxes.

This approach may require a few dozen more lines of boilerplate than LangChain, but the trade-off is:

Fully controllable execution flow
Zero framework debugging overhead
Team members don't need to learn yet another DSL
No lock-in to a framework's version upgrades

When Should You Use LangChain?

To be fair, LangChain still has value in certain scenarios:

Rapid prototyping: you want a working RAG demo in 30 minutes
Teaching / learning: understanding RAG and related concepts through a structured approach
Standardized simple workflows: your requirements happen to perfectly match its Chain pattern

But if you're building a production-grade system, LangChain is more likely to become technical debt than an accelerator.

Conclusion

LangChain did something few dared to do: attempt to provide a unified framework during the most chaotic period of the LLM ecosystem. That courage deserves respect.

But the experience of 2024–2026 has shown: for production AI Agent systems, simple, direct code beats complex framework abstractions. LLMs themselves are already complex enough — you don't need a framework adding another layer of complexity on top.

The biggest feeling our team had after dropping LangChain wasn't "we lost features" — it was "we're finally free." We can write the code we want in the most direct way, instead of figuring out how to make the framework allow us to write it.

If you're starting a new LLM project, my advice is: try going framework-free first. Write code with the native SDKs for a few weeks. Then decide whether you truly need those abstractions. The answer will likely be no.

This article is adapted from the video "Why We Dropped LangChain," drawing on a team's real experience of using LangChain in production for 12 months before removing it, analyzing the cost of framework abstractions and the advantages of a modular building-block approach.

rag-explained-how-it-works

Leo Han — Tue, 09 Jun 2026 15:06:58 +0000

RAG Explained: How Retrieval-Augmented Generation Actually Works

What Is RAG?

RAG (Retrieval-Augmented Generation) is one of the most important architectural patterns in LLM applications from 2024–2025. The core idea is simple: before the LLM generates an answer, retrieve relevant information from an external knowledge base, inject the retrieval results into the context, and then have the model generate an answer based on that information.

Why is RAG needed? Large language models have three inherent limitations: knowledge cutoff dates (the temporal boundary of training data), hallucination (fabricating non-existent facts), and insufficient domain expertise (lacking enterprise-internal or specialized data). RAG circumvents the model's internal knowledge constraints by adopting a "retrieve first, generate later" approach, allowing the LLM to reference the latest and most accurate private data.

The Core RAG Workflow

A standard RAG system follows this pipeline:

Document Library → Chunking → Embedding → Vector Database Storage
                                                    ↓
User Query → Query Embedding → Similarity Search → Retrieve Relevant Chunks
                                            ↓
                        LLM Generates Answer (based on retrieved results + original query)

This pipeline can be decomposed into two phases: the offline indexing phase (document processing and storage) and the online query phase (real-time retrieval and generation).

The Offline Indexing Phase

1. Document Parsing & Chunking

Raw documents (PDFs, web pages, database records, etc.) are typically too long for direct vector retrieval. They need to be split into appropriately sized chunks.

Chunking strategy directly impacts retrieval quality. Common approaches include:

Fixed-size chunking: split by token count or character count (e.g., 512 tokens per chunk)
Semantic chunking: split along natural boundaries like paragraphs and sections
Recursive chunking: start with coarse separators (chapter headers), then progressively refine
Overlapping chunking: maintain overlap between adjacent chunks (e.g., 10–20%) to prevent key information from being severed at boundaries

Chunk size involves a trade-off: too small and the semantics are incomplete; too large and retrieval precision degrades.

2. Embedding

After chunking, an embedding model converts each text chunk into a fixed-dimensional vector. These vectors are points in high-dimensional space — semantically similar texts are closer together in vector space.

Choosing the right embedding model is critical. Currently mainstream models include:

Model	Dimensions	Max Tokens	Characteristics
text-embedding-3-large	3072	8191	OpenAI recommended, excellent value
text-embedding-3-small	1536	8191	Lightweight, fast
multilingual-e5-large	1024	512	Strong multilingual support
GTE-Qwen2-7B-instruct	3584	32768	Open-source SOTA, long text support
BGE-M3	1024	8192	Multilingual + sparse-dense hybrid

For example, a user query like "How tall is the Empire State Building?" gets converted through embedding into a dense vector like [1.0, 2.5, 3.7, 5.8, 2.8].

3. Vector Database Storage

The embedded document chunks are stored in a vector database (Pinecone, Weaviate, Milvus, Qdrant, Chroma, etc.). The core capability of a vector database is Approximate Nearest Neighbor (ANN) search, which can find the K most similar results from millions or even billions of vectors in milliseconds.

The Online Query Phase

1. Query Embedding

The user's question is first converted into a vector using the same embedding model. Queries and documents must use the same embedding model — otherwise, they lie in different vector spaces and similarity calculations become meaningless.

2. Similarity Search

The query vector performs a similarity search against the vector database. Common similarity measures include:

Cosine Similarity: measures how close two vectors are in direction; range [-1, 1], unaffected by vector magnitude
Euclidean Distance: the straight-line distance in space; smaller values indicate greater similarity
Dot Product: suitable for normalized vectors

For instance, suppose document chunk A has the vector [1.3, 1.5, 3.3, 5.7, 4.9] and the query vector is [1.0, 2.5, 3.7, 5.8, 2.8]. Their cosine similarity is approximately 0.47. Meanwhile, document chunk B with vector [4.8, 3.7, 1.5, 5.2, 6.0] has a similarity of about 0.51 to the same query — indicating that B is semantically closer and should rank higher.

3. Re-ranking

The results from the initial retrieval (typically top 10–50) are not always optimal. The re-ranking stage uses a more precise (but slower) model to re-sort the candidates.

Cross-Encoders are the standard method for re-ranking. Unlike Bi-Encoders that encode the query and document independently, Cross-Encoders concatenate the query and document together before feeding them into the model, capturing finer-grained interaction patterns between them. The ranking accuracy is significantly higher, though at greater computational cost.

# Bi-Encoder (initial retrieval): fast but lower precision
# Query and document are encoded independently
query_vec = embed(query)
doc_vec = embed(document)
similarity = cosine(query_vec, doc_vec)

# Cross-Encoder (re-ranking): slow but high precision
# Query and document are concatenated and jointly encoded
score = cross_encoder(query, document)

In production, a two-stage retrieval approach is standard: first use a Bi-Encoder to quickly recall the top-N from massive candidates, then use a Cross-Encoder to precisely re-rank the top-N and select the top-K to feed into the LLM.

Evaluating RAG Systems

RAG system quality can be measured across multiple dimensions:

Retrieval quality metrics:

Recall@K: whether the correct answer appears in the top-K results
MRR (Mean Reciprocal Rank): the average reciprocal rank of the first correct answer
NDCG (Normalized Discounted Cumulative Gain): a weighted score accounting for ranking position

Generation quality metrics:

Faithfulness: whether the generated content faithfully reflects the retrieved context
Answer Relevance: whether the answer addresses the question
Context Relevance: whether the retrieved content is relevant to the question

The RAGAS (RAG Assessment) framework is widely used for automated evaluation, providing a systematic scoring system atop standard benchmarks like MTEB (Massive Text Embedding Benchmark).

Advanced RAG Patterns

Query Rewriting

Raw user questions are often imprecise. Rewriting queries before retrieval — expanding synonyms, supplementing context, decomposing complex questions — can significantly boost recall.

Hybrid Search

Fuse results from dense retrieval (vector similarity) and sparse retrieval (keyword matching, e.g., BM25). Dense retrieval excels at semantic matching; sparse retrieval excels at exact matching. The two are complementary.

Multi-hop Retrieval

For questions requiring multi-step reasoning, the first round of retrieval results can generate new queries for a second (or more) round of retrieval, progressively approaching the answer.

Self-RAG

Allow the LLM to self-assess during generation whether retrieval is needed, whether the retrieved results are relevant, and whether the generated content is grounded — achieving "on-demand retrieval" rather than indiscriminate retrieval.

Conclusion

Through its "retrieve → augment → generate" architecture, RAG effectively addresses the three major challenges of LLMs: knowledge staleness, hallucination control, and domain adaptation. A production-grade RAG system involves multiple critical decisions: chunking strategy selection, embedding model choice, vector database configuration, similarity measure design, and re-ranking mechanism integration.

As long-context models advance, some might ask: "Why not just stuff all the documents into the context window?" But practice shows that RAG's value lies not in "how much you can fit," but in precisely finding the most relevant pieces of information — retrieval quality determines the ceiling of answer quality.

This article is adapted from the video "How RAG Works," covering the definition of RAG, the two-phase indexing and query pipeline, embedding and vector retrieval principles, chunking strategies, re-ranking mechanisms, and advanced patterns.

building-ai-agents-right-way

Leo Han — Tue, 09 Jun 2026 15:06:48 +0000

Building AI Agents the Right Way: Lessons from Anthropic Engineering

Why Following a Tutorial Will Make Your System Worse

2025–2026 has seen an explosion in AI Agent adoption, yet most teams keep making the same mistakes. As the Anthropic Engineering team has learned through extensive real-world experience: blindly "following a tutorial to build an Agent" will only produce a worse system.

This guide distills the core principles and practical methods for building effective, reliable Agent systems.

Principle 1: Don't Use Agents Just to Use Agents

Stop Using Agents Just to Use Agents.

This is the most important principle of all. Agents are not a silver bullet. Many teams jump straight to handing every task to an Agent, only to discover that latency skyrockets, costs spiral out of control, and system stability falls below what their original deterministic logic achieved.

Agents are appropriate for:

Open-ended reasoning tasks (multi-step, non-deterministic paths)
Tasks requiring interaction with multiple external tools
Tasks where the objective is clear but the execution path needs dynamic adjustment

Agents are not appropriate for:

Simple CRUD operations — use a direct API call
Deterministic data processing — use a traditional pipeline
Problems solvable by a single model call — no need to wrap it in an Agent framework

In one sentence: if an API call can do it, don't bring in an Agent.

Principle 2: Plan → Execute, Step by Step

An Agent is not as simple as "throw an instruction at the model and wait for the result." A robust Agent architecture follows a Plan → Execute two-phase model:

Plan phase: The Agent first thinks holistically and devises an execution plan. During this phase, it calls no external tools — it relies purely on reasoning to break the task into executable steps.

Execute phase: Execute step by step according to the plan, with each step having clear inputs, outputs, and validation conditions.

Step 1 → Step 2 → Step 3
  ↓         ↓        ↓
Observe   Observe   Final Result

This "plan first, execute later" pattern is far more stable than "think while doing," because the planning phase provides the Agent with a global perspective, preventing it from getting stuck in local optima or drifting off course.

Principle 3: Memory Is the Soul of an Agent

The Agent's core challenge isn't "intelligence" — it's "memory." Memory operates across two dimensions:

Internal Memory: The Agent's context within a single execution — including the task description, completed steps, observations, and intermediate reasoning. This is essentially the LLM's context window. Managing the context well (not losing critical information, not retaining redundancy) is the foundation of Agent stability.

External Storage: Persistent memory that spans multiple executions. This includes:

Task history and result caching
Lessons learned from experience
User preferences and feedback

External storage enables the Agent to learn from history, avoid repeating mistakes, and gradually optimize its behavior over time.

Principle 4: Embrace Sub-Agents

Complex tasks should not be handled by a single monolithic Agent. The Sub-Agent pattern is standard equipment for production-grade Agent systems:

Master Agent (Orchestrator): responsible for task understanding, decomposition, and scheduling
Sub-Agents: each handles a specialized sub-task (search, code generation, data analysis, format conversion, etc.)

Benefits of this architecture:

Each sub-agent has more focused instructions, leading to fewer hallucinations
Multiple sub-tasks can run in parallel
Sub-agents are decoupled, making problems easier to isolate
Different models can be used for different sub-agents (expensive ones for reasoning, cheap ones for formatting)

Principle 5: Avoid Premature Correction

Errors during Agent execution are normal, but your error-handling strategy determines the final system quality.

The common mistake: the Agent takes one step, finds the result unsatisfactory, immediately self-corrects, revises the plan, and re-executes. This causes "oscillation" — the Agent ping-pongs between directions and never finishes.

Wrong pattern:
Original Bug → AI Auto-Fixed → Introduces New Bug → Fix Again → ...

Right pattern:
Original Bug → Complete Full Execution First → Evaluate Holistically → One-Time Fix

The correct approach: let the Agent complete a full execution first, then make a one-time correction based on the overall result — rather than "fine-tuning" at every step. In other words, give the Agent room to make mistakes, but limit how often it can correct itself.

Principle 6: Tool Chain Design

More tools doesn't mean a better Agent. The key principles for tool chain design:

Atomicity: each tool does one thing, and does it well. Don't design "all-purpose" tools.
Standardized I/O: all tools use a unified input/output format to reduce the Agent's cognitive load.
Clear error returns: when a tool call fails, the return must precisely describe "what went wrong" and "possible correction paths" — not a vague "Error."
Least privilege: each sub-agent is granted only the minimum toolset needed to complete its task.

Principle 7: Start from Mature Prompts

Don't write an Agent's system prompt from scratch. Search for and draw on mature prompts that have been thoroughly tested in the community, then adapt them to your specific scenario.

A good Agent prompt typically includes:

Clear role definition and capability boundaries
Explicit tool-calling formats with examples
Error-handling strategies
Termination conditions and output format requirements

Anthropic Engineering's practice shows that prompt engineering is far more critical in the Agent context than in ordinary chat scenarios — a well-structured prompt can reduce failure rates from 40% to below 5%.

The Complete Agent System Pipeline

User Input
    ↓
Task Parsing & Classification (Orchestrator Agent)
    ↓
Plan Phase: Devise Execution Plan
    ↓
Distribute Sub-tasks to Sub-Agents
    ↓
Sub-Agent 1      Sub-Agent 2      Sub-Agent 3
(Search)         (Code)           (Analysis)
    ↓                ↓                ↓
Aggregate & Validate Results (Orchestrator Agent)
    ↓
(If Necessary) One-Time Correction
    ↓
Final Output

Conclusion

Building an AI Agent is not about buying a framework and tweaking a few parameters. A truly effective Agent system requires thoughtful design decisions at the architectural level. Remember these seven core principles: use Agents sparingly, plan before executing, manage internal and external memory, decompose complexity with sub-agents, avoid premature correction, design tool chains carefully, and start from mature prompts.

The most important lesson is this: the goal of an Agent is not to look "intelligent" — it's to complete work stably and predictably. If following a tutorial verbatim produces a working solution, that task likely never needed an Agent in the first place.

This article is adapted from the video "Building AI Agents — Follow the Tutorial and Your System Will Only Get Worse," drawing on practical experience from the Anthropic Engineering team and covering the seven core principles of Agent construction along with the complete pipeline design.

agents-concepts-principles-patterns

Leo Han — Tue, 09 Jun 2026 15:06:13 +0000

AI Agents: Concepts, Principles, and Patterns

What Is an AI Agent?

In 2025–2026, AI Agents have become the dominant paradigm for building applications on top of large language models. Simply put, an Agent is an AI system that can autonomously perceive its environment, reason about next steps, take actions, and continuously adjust based on feedback. The fundamental difference from a traditional chatbot is this: an Agent doesn't just answer questions — it actively uses tools, executes multi-step tasks, and self-corrects when things go wrong.

If a large language model (LLM) is the "brain," then an Agent is that brain equipped with "hands and feet" — it reads and writes files, runs terminal commands, searches the web, calls APIs, and transforms the model's reasoning capabilities into real-world actions.

The Core Principle: The ReAct Paradigm

The philosophical foundation of Agent operation comes from a landmark paper: ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al.).

The paper identified a critical problem: before ReAct, the LLM's reasoning abilities (e.g., Chain-of-Thought prompting) and acting abilities (e.g., generating action plans) were studied as two separate topics. Reasoning happened only "in the head," and acting happened only "externally," with no synergy between them.

ReAct's core contribution is interleaving reasoning and acting in a looping fashion. Specifically:

Reasoning helps the model induce, track, and update action plans, and handle exceptions
Acting allows the model to interface with external resources (knowledge bases, code environments, web pages, etc.) and gather additional information

This synergy delivers significant advantages: on question-answering (HotpotQA) and fact verification (Fever), ReAct effectively overcomes the hallucination and error propagation issues of pure Chain-of-Thought reasoning by interacting with a Wikipedia API. On interactive decision-making benchmarks (ALFWorld), ReAct substantially outperforms prior approaches.

The Agent Work Loop

A typical Agent operates according to the following cycle:

Task
  ↓
Thought: Analyze the current state, decide what to do next
  ↓
Action: Invoke a tool to perform an operation
  ↓
Observation: Receive the tool's output
  ↓
(Repeat until a final answer can be given)
  ↓
Final Answer

This loop may look simple, but it gives Agents tremendous adaptability. If an action doesn't produce the expected result, the Agent can adjust its strategy in the next thought — in essence, trading runtime reasoning for robustness.

A Question-Answering Example

Task: How tall is the Empire State Building?

Thought: I need to look up the height of the Empire State Building.
Action: get_height("Empire State Building")
Observation: Roof height 381 meters, antenna tip 443 meters

Thought: I now have the height data and can answer the user.
Final Answer: The Empire State Building has a roof height of 381 meters
and a total height of 443 meters including the antenna.

A Cooking Task Example

Task: I need to make tomato scrambled eggs

Thought: First, let me find a recipe for tomato scrambled eggs.
Action: find_recipe(dish="tomato scrambled eggs")
Observation: Needs: 3 eggs, 2 tomatoes, salt, sugar, oil, scallions

Thought: I should check if these ingredients are in the fridge.
Action: check_fridge(item="eggs")
Observation: Eggs are in the fridge

Thought: Ingredients confirmed, ready to start cooking.
Final Answer: Key ingredients confirmed: eggs in stock.
Need to verify tomatoes...

The Agent Tool System

An Agent's capabilities are bounded by the tools it can use. From the demonstrations, a typical Agent toolset includes:

read_file(file_path) — read the contents of a specified file
write_to_file(filename, content) — create or overwrite a file
run_terminal_command(command) — execute a command in the terminal

These tools are defined in the System Prompt using XML format, and the Agent issues structured tool calls with XML tags:

<action>write_to_file("test.txt", "a\nb\nc")</action>

Formatting conventions typically use \n for newlines to ensure tool call parameters are correctly transmitted.

Three Agent Construction Patterns

The video demonstrated three mainstream Agent construction patterns through real-world examples:

1. General-Purpose Agent Platform: Manus

Manus is a general-purpose AI Agent capable of autonomously completing complex, multi-step research tasks. The video showed it executing the task "iPhone 15 Pro Max vs Galaxy S24 Ultra vs Pixel 8 Pro comparison report" in its entirety:

Autonomously searched for specifications and performance data for all three phones
Collected visual assets and reference images
Generated a comprehensive comparison website
Produced a structured report (including executive summary, detailed comparison tables, etc.)

Manus's defining characteristic is high autonomy — the user only needs to provide a task description, and the Agent independently plans, searches, organizes, and outputs, with no human intervention required.

2. Code Agent: Claude

Claude as a code agent demonstrated building a Snake game using HTML/CSS/JavaScript:

Received the task: "Write a Snake game using HTML, CSS, and JS"
Planned the file structure: index.html, style.css, script.js
Created files one by one and wrote the code
Delivered a runnable game

Claude's pattern illustrates how Agents can execute deterministic coding tasks in a controlled environment — each step has clear inputs and outputs, and errors can be detected and corrected promptly.

3. Open-Instruction Agent: DeepSeek

DeepSeek's demonstration focused more on following extremely detailed system instructions. The video showed its Agent-mode prompt structure:

Strictly defined XML tags for <thought>, <action>, <observation>, <final_answer>
Specified the operating system environment (macOS 15.5) and working directory
Provided complete documentation of tool definitions and calling formats
Also executed the Snake game construction task

DeepSeek's case illustrates that through meticulous prompt engineering, even a general-purpose chat model can be shaped into an agent that follows a specific Agent protocol.

Key Design Decisions for Building Agents

Drawing from these cases, building an effective Agent involves several critical decisions:

1. Prompt structure design. The Agent's system prompt must precisely describe its role, available tools, output format, and reasoning steps. XML tags may seem tedious, but they provide a structured "grammar" for the model's output, reducing the probability of parsing failures.

2. Tool interface granularity. Tools should be sufficiently atomic (e.g., read_file, write_to_file) so the Agent can flexibly compose them, rather than offering overly monolithic "do-everything" functions.

3. Quality of observation feedback. The Observation returned by tools is the Agent's sole basis for adjusting its next strategy. If the return information is too terse or ambiguous, the Agent's reasoning chain breaks.

4. Termination conditions. The Agent needs a clear stopping signal (<final_answer>), or it may fall into an endless "think-act" loop. In practice, a maximum step limit is typically set as a safety net.

5. Error handling and recovery. The core advantage of the ReAct paradigm is its ability to handle exceptions — when a tool call fails or returns unexpected results, the Agent can reassess the situation in the next thought and try alternative approaches.

Conclusion

AI Agents represent a critical leap from "language models" to "action models." The ReAct paradigm, by interleaving reasoning and acting, transforms the LLM from a system that can only "speak" into an agent that can "do."

From Manus's autonomous research, to Claude's code generation, to DeepSeek's precise instruction execution, we see three distinct implementation paths for Agents — but they all share the same core philosophy: thinking guides action, and action feeds back into thinking.

As the tool ecosystem matures and model reasoning capabilities advance, Agents are moving from experimental prototypes to production-grade applications. Understanding the ReAct paradigm and mastering Agent construction patterns will become essential skills for engineers in the AI era.

This article is adapted from the video "Agent Concepts, Principles, and Construction Patterns," covering the definition of Agents, the core ideas of the ReAct paper, the Agent work loop, tool system design, and a comparative analysis of three mainstream construction patterns.

In what way do AI models operate

Leo Han — Tue, 26 May 2026 11:53:27 +0000

The Rise of AI
Over the past nearly two years since 2025, artificial intelligence has developed at a rapid pace and become a major trend. AI, intelligent agents and related technologies can be seen everywhere in daily life. It seems that AI is being applied in countless scenarios, with seemingly no rivals and capable of accomplishing almost anything. Nearly everyone has used large language models such as GPT, DeepSeek, Doubao and Yuanbao.
Back in 2025, we believed AI programs were merely based on chatbots, designed only for tasks related to the internet and information technology. Today, however, we realize they are far more sophisticated.
In the traditional internet industry, we have long witnessed the internet bubble economy and profit models decoupled from real industries, which rely heavily on venture capital. For this reason, many once thought AI could never be integrated into traditional trades like hairdressers, plumbers and maintenance workers, and that these occupations would remain largely unaffected by AI. Yet with the advancement of robots and smart hardware, all these possibilities have become tangible. From a short-term economic perspective, the only current limitation is the high cost of applying AI to solve simple problems.
To truly understand AI programs at their core, it is necessary to learn some basic professional terms and figure out how they actually operate.

The Evolution of AI
In its early days, AI existed mainly in the form of chatbots — intelligent conversational robots. The early version of ChatGPT is a typical example. Back then, people interacted with AI by entering text on websites to get automated responses. Nowadays, modern AI agent models are able to handle tasks for a wide range of specialized scenarios, such as generating reports and creating videos.
Accordingly, AI can be divided into the following major categories:

Demystifying AI
We can elaborate on these categories as follows:
ANI (Artificial Narrow Intelligence)
Also known as narrow AI. It is designed for specific scenarios, with autonomous driving models as a typical example.
Generative AI
It refers to generative artificial intelligence. Tools like ChatGPT fall into this category and can be applied to numerous scenarios.
AGI (Artificial General Intelligence)
This represents the ultimate goal of AI. It can accomplish all tasks that humans are capable of, and even handle work beyond human ability.
Machine Learning (ML)
Machine learning serves as a core technical pillar driving the development of AI. A key branch is Supervised Learning. Its fundamental objective is to convert given inputs into the desired outputs.

Here are simple examples. Suppose we develop an email program empowered by AI to detect spam emails. This AI-powered function is called spam filtering. If we feed audio as input and get text transcripts as output, the corresponding AI technology is speech recognition. When you input text, known as the source text, and expect another piece of text as the result, namely the target text, this is how a chatbot works — it generates text by making accurate predictions.
LLM (Large Language Models)
Large Language Models are trained based on machine learning, specifically supervised learning, to predict the next word in a sequence. To put it simply, you can draw parallels to the word segmentation mechanism of Elastic. As I mentioned before, here is a straightforward illustration:
Take the input sentence: My most commonly used database is Elastic.

Input Output
My most commonly used database
My most commonly used database is Elastic

When a model is trained on massive datasets, it evolves into systems like ChatGPT. Given an initial prompt, it can generate relevant responses. In fact, GPT models do far more than just predicting the next single word. They also filter and refine language to deliver more accurate replies.
Over the past two years, many companies have been engaged in data annotation for model training. They hire staff to label large volumes of images and texts. Such work is essential to build up the recognition capabilities of models under supervised learning. Thanks to the advancement of computers and the internet, vast and diverse data resources are readily available, which has led to the remarkable leap in AI model performance in recent years.
Lastly, let's talk about a hugely popular concept in modern AI: neural networks.

As data volume keeps increasing, model performance will first improve and then level off. Traditional AI models cannot grow smarter endlessly with more data. Nevertheless, high performance relies heavily on big data. Meanwhile, the advancement of GPUs has greatly boosted large-scale computing power, providing stronger resources for model training.
The core concepts of artificial intelligence are machine learning and supervised learning, which essentially map inputs to corresponding outputs.

DataSet
Datasets are fundamental to AI systems. Here is a simple real-life example.
We can create a basic dataset consisting of delivery distance (kilometers) and estimated delivery time (minutes):

Delivery Distance (km) Estimated Time (min)
1 15
2 20
3 25
4 30
5 35

In this case, the delivery distance serves as the input, and the estimated delivery time is the output. We can also add more input features, such as the number of traffic lights, to build an extended dataset:

Number of Traffic Lights Delivery Distance (km) Estimated Time (min)
1 1 15
1 2 20
2 3 30
2 4 35
2 5 34

Likewise, we can set delivery time as the input to predict whether a delivery route can meet the time requirement.
Apart from numerical data, there are many other application scenarios. For instance, security verification is commonly required when accessing certain websites, such as the widely discussed:

Messy Data
Many enterprises consider leveraging their existing operational data to conduct AI predictive analysis. However, there is a harsh reality.
A great number of CEOs assume that with abundant user and production data and an AI team, they can easily carry out industrial predictive analysis and generate tangible value. This idea is actually questionable, for the reasons listed below:
First, the data lacks continuity. When building big data systems, most internet companies store data in data warehouses via stream messages and other approaches. Such data reflects various business metrics but often has little practical value.
Second, the data contains massive junk content. A large portion of the data is useless and unfit for AI model training.
Third, the data is incomplete and discontinuous. Issues like missing values, unknown fields and even manually tampered business data make the data invalid for regression training.
Neural Net
From the above content, we can clearly see the performance gap between traditional AI and neural networks. Do not be misled by the terminology. Artificial neural networks draw inspiration from the nervous system of the human body and are composed of numerous neurons. We can explain its characteristics with the previous delivery distance example:

This scenario reminds me of the regression process in M&V for energy consumption forecasting. It adopts linear regression to make phased predictions. To be specific, it performs discrete regression using data from Phase A to B to generate a linear regression formula, which is then applied to forecast outcomes for Phase C. This method is only used for subsequent predictive analysis, and the corresponding regression formula is displayed in charts. Its implementation requires calculations involving numerous factors. We can regard these factors as neurons, each acting as a channel between input and output with its own computing logic.
Simply put, a neural network is a set of conditions made up of various neurons. The more neurons there are, the larger the neural network will be, leading to more accurate output results.
What Machine Learning Is Good At
At first, I thought the rise of AI would have little impact on traditional industries such as manual labor and service sectors. But the reality turns out to be quite different. Currently, businesses are reluctant to spend huge costs replacing low-wage jobs with AI. However, as AGI continues to evolve, it will gradually take over more human work on a large scale.
Meanwhile, many people hold overly high expectations for AI. We need to realize that AI is not a panacea. The distinction is easy to understand. For example, asking AI to generate reports and process data works well, because humans can give clear and complete instructions with definite operating logic. In contrast, it is unrealistic to expect AI to predict stock market trends or winning lottery numbers. Even humans cannot accomplish such tasks. Though we can feed massive complex data — including historical market trends, corporate operation reports and traffic statistics — for reference, strong randomness still cannot be eliminated.
The same applies to popular smart vehicles and autonomous driving. Cars are equipped with cameras and radars to capture driving information, which is a typical input-output application. Yet they cannot effectively recognize human body movements. For instance, autonomous vehicles can plan routes and detect obstacles via sensors, but fail to identify passengers hailing cars by waving. Diverse body gestures make it impossible to form a fixed mapping from input to output.
Likewise, many hospitals have introduced AI to analyze medical reports such as CT scans. Such systems work well only when scans follow standard rules: all images are placed correctly, and CT slices of the heart are positioned uniformly during supervised training. If applied to non-standard scans, for example images taken when the patient lies sideways, the recognition error will increase dramatically.
To sum up, we can judge the applicable scenarios of machine learning by the following rules:
Scenarios where ML performs well
Learning relatively simple concepts
Having a large volume of available data
Scenarios where ML performs poorly
Learning complex concepts with limited data
Processing unseen and brand-new types of data

DEV Community: Leo Han

LangGraph: Engineering Controllable Enterprise Agents

LangGraph: Engineering Controllable Enterprise Agents

1. Why enterprise agents need more than a single LLM call

2. From chains to graphs

3. The three core concepts

4. A production-oriented architecture

5. Tool results should go back to the agent

6. Persistence and checkpoints

7. Human-in-the-loop

8. Self-correction loops

9. Adoption roadmap

10. Final takeaway

References

LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide

LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide

1. The role of LangChain in enterprise AI

2. Agent = Model + Harness

3. Why agents need tools

4. Engineering rules for tools

5. Standard model interface

6. Messages: prompt is not a single string

7. Short-term memory

8. Long-term memory

9. Middleware

10. Observability and LangSmith

11. Recommended enterprise architecture

12. Adoption guidance

13. Final takeaway

References

LangChain-Core-Components-Guide

LangChain Architect's Guide: Building LLM Applications from First Principles

Table of Contents

1. Introduction: Why LangChain Matters

1.1 The LLM Revolution and Its Bottlenecks

1.2 What LangChain Actually Is

1.3 The LLM Landscape

2. Episode 1: LangChain Overview & the LLM Landscape

2.1 Key Questions

2.2 The Raw API Problem

2.3 LangChain's Answer

3. Episode 2: Hello World & ConversationChain

3.1 Environment Setup

3.2 First LangChain Call

3.3 Temperature: The Creativity Knob

3.4 Jupyter Notebook for LLM Development

3.5 ConversationChain: Giving LLMs Memory

3.6 How Prompts Pass Information to LLMs

4. Episode 3: Model I/O — Prompt Engineering at Scale

4.1 The Anti-Pattern: Raw String Concatenation

4.2 PromptTemplate

4.3 Few-Shot Prompting

4.4 Example Selector

4.5 Output Parsers

5. Episode 4: Data Connection — Teaching LLMs to Read Your Data

5.1 The Core Problem

5.2 Document Loaders

5.3 Text Splitters

5.4 Word Embeddings: The Math of Meaning

5.5 Vector Stores

6. Episode 5: Chains — The Art of Orchestration

6.1 What Is a Chain?

6.2 Chain Types

6.3 Document Chains (RAG Core)

7. Episode 6: Agents — Autonomous LLM Reasoning

7.1 Chains vs Agents

7.2 ReAct Pattern

7.3 Agent Implementation

7.4 Agent Types

7.5 Tuning

8. Episode 7: Hands-on PDF Q&A System

9. Episode 8: Hands-on Advanced Search Agent

10. Episode 9: Retrospective & Best Practices

Production Checklist

API Migration (2 Years)

11. Appendix: Troubleshooting

Learning Path

Epilogue

why-we-dropped-langchain

Why We Dropped LangChain: When Abstractions Do More Harm Than Good

Problem 2: The `http.client` vs. `requests` Analogy