<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leo Han</title>
    <description>The latest articles on DEV Community by Leo Han (@leo_han_02060526).</description>
    <link>https://dev.to/leo_han_02060526</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3952043%2F449912c6-c1f9-4007-b04f-f06dd82c1663.jpg</url>
      <title>DEV Community: Leo Han</title>
      <link>https://dev.to/leo_han_02060526</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leo_han_02060526"/>
    <language>en</language>
    <item>
      <title>LangGraph: Engineering Controllable Enterprise Agents</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Mon, 15 Jun 2026 10:44:56 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/langgraph-engineering-controllable-enterprise-agents-525a</link>
      <guid>https://dev.to/leo_han_02060526/langgraph-engineering-controllable-enterprise-agents-525a</guid>
      <description>&lt;h1&gt;
  
  
  LangGraph: Engineering Controllable Enterprise Agents
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1. Why enterprise agents need more than a single LLM call
&lt;/h3&gt;

&lt;p&gt;In early prototypes, an AI application may look like a simple prompt-response loop. A user asks a question, the model returns an answer. In production, this pattern quickly reaches its limits.&lt;/p&gt;

&lt;p&gt;LLMs do not automatically know real-time business data, internal database records, or operational context. They also do not reliably execute long-running workflows by themselves. On the other hand, fully autonomous agents can become unpredictable: they may loop, call the wrong tool, produce hallucinated decisions, or perform unsafe actions.&lt;/p&gt;

&lt;p&gt;Enterprise AI needs an orchestration layer that gives models controlled autonomy. LangGraph provides this layer by modeling agent workflows as graphs with explicit state, nodes, edges, persistence, and human oversight.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. From chains to graphs
&lt;/h3&gt;

&lt;p&gt;A chain is a fixed sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start -&amp;gt; Step 1 -&amp;gt; Step 2 -&amp;gt; Step 3 -&amp;gt; End
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is reliable but rigid. A graph is more expressive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Start
  -&amp;gt; Agent Node
  -&amp;gt; Tool Node
  -&amp;gt; Agent Node
  -&amp;gt; Human Review Node
  -&amp;gt; End
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With LangGraph, the next step can be selected dynamically based on the current state. This turns an agent from a linear script into a controlled state machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The three core concepts
&lt;/h3&gt;

&lt;p&gt;LangGraph workflows are built from three primitives.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;State&lt;/code&gt; is the shared data structure of the workflow. It may contain messages, user context, task IDs, tool results, approval status, risk level, retry count, and final outputs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Node&lt;/code&gt; is a unit of work. A node can call an LLM, execute a tool, validate a rule, retrieve documents, wait for human approval, or format a result.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Edge&lt;/code&gt; controls what happens next. Normal edges represent fixed transitions. Conditional edges route execution based on state.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. A production-oriented architecture
&lt;/h3&gt;

&lt;p&gt;A practical enterprise agent can be structured as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Request
  -&amp;gt; Input Validation
  -&amp;gt; Intent Router
  -&amp;gt; Agent Reasoning
  -&amp;gt; Tool Selection
  -&amp;gt; Tool Execution
  -&amp;gt; Result Normalization
  -&amp;gt; Risk Check
  -&amp;gt; Human Review, optional
  -&amp;gt; Final Response
  -&amp;gt; Audit Log / Metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separates model reasoning from operational control. The LLM interprets and plans. Tool nodes access external systems. Risk nodes enforce policies. Human review nodes approve high-risk actions. State and checkpoints make the workflow recoverable and auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Tool results should go back to the agent
&lt;/h3&gt;

&lt;p&gt;A common mistake is to return raw tool output directly to the user. Tool outputs are often JSON payloads, database rows, API responses, or error codes. The better pattern is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent decides a tool is needed
  -&amp;gt; Tool executes
  -&amp;gt; Tool result is written to State
  -&amp;gt; Agent reads State again
  -&amp;gt; Agent produces a business-readable answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the model responsible for explaining tool results in context.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Persistence and checkpoints
&lt;/h3&gt;

&lt;p&gt;Production agents cannot assume every task finishes in a single request. Workflows may pause for approval, fail due to external systems, or resume after service restarts.&lt;/p&gt;

&lt;p&gt;Checkpoints allow the graph state to be persisted and resumed. This enables long-running workflows, human approval flows, failure recovery, and detailed audit trails.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Human-in-the-loop
&lt;/h3&gt;

&lt;p&gt;Human oversight is not a weakness. It is what makes high-impact AI automation deployable.&lt;/p&gt;

&lt;p&gt;Human review is recommended for irreversible operations, low-confidence decisions, compliance-sensitive actions, tool parameter changes, and conflicts between model plans and business rules.&lt;/p&gt;

&lt;p&gt;In graph form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;risk_check
  -&amp;gt; low_risk: execute_tool
  -&amp;gt; high_risk: human_review
human_review
  -&amp;gt; approved: execute_tool
  -&amp;gt; edited: execute_tool_with_new_args
  -&amp;gt; rejected: final_reject_response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  8. Self-correction loops
&lt;/h3&gt;

&lt;p&gt;Graphs can express review-and-retry patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;generate_plan
  -&amp;gt; review_plan
  -&amp;gt; if pass: execute_plan
  -&amp;gt; if fail: generate_plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful for code generation, SQL generation, document writing, compliance review, and RAG answer validation. Production systems must set loop limits, cost limits, timeout limits, and fallback behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Adoption roadmap
&lt;/h3&gt;

&lt;p&gt;Start by converting an existing prompt feature into a graph. Then add read-only tools. Next, introduce checkpointing and thread-level memory. After that, add write operations behind human approval. Finally, standardize common capabilities such as tool registries, state schemas, approval components, tracing, regression tests, and evaluation datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Final takeaway
&lt;/h3&gt;

&lt;p&gt;LangGraph is not about giving agents unlimited freedom. It is about giving them structured freedom. For engineering teams, the real shift is from prompt engineering to state-machine engineering, workflow engineering, and runtime engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph overview: &lt;a href="https://docs.langchain.com/oss/python/langgraph/overview" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langgraph/overview&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangGraph Graph API: &lt;a href="https://docs.langchain.com/oss/python/langgraph/graph-api" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langgraph/graph-api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangGraph persistence: &lt;a href="https://docs.langchain.com/oss/python/langgraph/persistence" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langgraph/persistence&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain overview: &lt;a href="https://docs.langchain.com/oss/python/langchain/overview" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Mon, 15 Jun 2026 10:44:37 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/langchain-agents-tools-and-memory-an-enterprise-engineering-guide-2op5</link>
      <guid>https://dev.to/leo_han_02060526/langchain-agents-tools-and-memory-an-enterprise-engineering-guide-2op5</guid>
      <description>&lt;h1&gt;
  
  
  LangChain Agents, Tools, and Memory: An Enterprise Engineering Guide
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1. The role of LangChain in enterprise AI
&lt;/h3&gt;

&lt;p&gt;If a model API is the engine, LangChain is the framework that helps engineering teams install that engine into real applications. It provides a standard model interface, agent construction, tool wrapping, message objects, memory, middleware, and observability integrations.&lt;/p&gt;

&lt;p&gt;The key point is that LangChain is not only about calling LLMs. It helps teams shape model behavior, extend model capabilities, and place LLMs inside testable and debuggable application structures.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Agent = Model + Harness
&lt;/h3&gt;

&lt;p&gt;An enterprise agent is not just an LLM. It needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A model interface.&lt;/li&gt;
&lt;li&gt;A system prompt.&lt;/li&gt;
&lt;li&gt;Tools.&lt;/li&gt;
&lt;li&gt;Message objects.&lt;/li&gt;
&lt;li&gt;Memory.&lt;/li&gt;
&lt;li&gt;Middleware.&lt;/li&gt;
&lt;li&gt;Observability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model is the reasoning core. The harness around it determines whether the agent can be safely used in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Why agents need tools
&lt;/h3&gt;

&lt;p&gt;Without tools, a model can mainly answer based on its training data and provided context. With tools, it can access real business systems.&lt;/p&gt;

&lt;p&gt;Typical tools include current-time lookup, database queries, file search, business APIs, order lookup, ticket operations, approval triggers, and controlled code execution.&lt;/p&gt;

&lt;p&gt;Tools are what turn an AI assistant from a chatbot into a business workflow participant.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Engineering rules for tools
&lt;/h3&gt;

&lt;p&gt;Tools should be small, typed, well-described, and auditable. Avoid large generic tools such as &lt;code&gt;handle_customer_issue(anything)&lt;/code&gt;. Prefer explicit tools such as &lt;code&gt;get_order_status(order_id)&lt;/code&gt; or &lt;code&gt;create_refund_request(order_id, reason)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Tool outputs should be structured. Write operations should have audit logs, idempotency keys, permission checks, and human approval when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Standard model interface
&lt;/h3&gt;

&lt;p&gt;LangChain helps teams integrate and switch among different model providers. Model configuration usually includes model name, temperature, max tokens, timeout, retry policy, and API credentials.&lt;/p&gt;

&lt;p&gt;For enterprise use, it is better to place a model gateway or model configuration service above individual agents. This enables cost control, rate limiting, fallback, provider switching, and audit.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Messages: prompt is not a single string
&lt;/h3&gt;

&lt;p&gt;In production, a prompt is a set of messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System messages.&lt;/li&gt;
&lt;li&gt;Human messages.&lt;/li&gt;
&lt;li&gt;AI messages.&lt;/li&gt;
&lt;li&gt;Tool messages.&lt;/li&gt;
&lt;li&gt;Summaries of previous context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many production issues are caused by poor message construction: conflicting instructions, missing tool results, overly long history, broken tool-call ordering, or missing thread isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Short-term memory
&lt;/h3&gt;

&lt;p&gt;Short-term memory means preserving context within a conversation thread. With a checkpointer and a &lt;code&gt;thread_id&lt;/code&gt;, an agent can remember previous interactions in the same thread.&lt;/p&gt;

&lt;p&gt;This is useful for customer support, task execution, form filling, coding assistants, and multi-step workflows. However, long histories increase cost and may distract the model, so teams need trimming, deletion, summarization, and filtering strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Long-term memory
&lt;/h3&gt;

&lt;p&gt;Long-term memory stores information across sessions, such as user preferences, recurring constraints, historical task summaries, and organization knowledge.&lt;/p&gt;

&lt;p&gt;It usually requires embeddings, a vector or structured store, retrieval logic, write policies, and deletion policies. Sensitive data must be governed carefully. Not everything should be remembered.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Middleware
&lt;/h3&gt;

&lt;p&gt;Middleware is where teams should place cross-cutting concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt-injection checks.&lt;/li&gt;
&lt;li&gt;PII redaction.&lt;/li&gt;
&lt;li&gt;Message summarization.&lt;/li&gt;
&lt;li&gt;Token budget control.&lt;/li&gt;
&lt;li&gt;Tool filtering.&lt;/li&gt;
&lt;li&gt;Simulated tool calls for testing.&lt;/li&gt;
&lt;li&gt;Human approval.&lt;/li&gt;
&lt;li&gt;Risk scoring.&lt;/li&gt;
&lt;li&gt;Output compliance checks.&lt;/li&gt;
&lt;li&gt;Audit logging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Middleware keeps business-specific agent code simpler and makes platform capabilities reusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Observability and LangSmith
&lt;/h3&gt;

&lt;p&gt;A production agent must be traceable. Teams need to know which model was used, which messages were sent, why a tool was selected, what arguments were passed, what the tool returned, which middleware changed the state, where latency came from, and how the run can be replayed.&lt;/p&gt;

&lt;p&gt;LangSmith or an equivalent observability platform turns agents from black boxes into auditable systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Recommended enterprise architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend / API
  -&amp;gt; Auth &amp;amp; Tenant Context
  -&amp;gt; Agent Gateway
  -&amp;gt; LangChain Agent
      -&amp;gt; Model Adapter
      -&amp;gt; Tool Registry
      -&amp;gt; Memory Layer
      -&amp;gt; Middleware Stack
      -&amp;gt; Human Approval
  -&amp;gt; Business Systems
  -&amp;gt; Observability / Audit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This architecture separates authentication, model access, tool permissions, memory, safety controls, and monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. Adoption guidance
&lt;/h3&gt;

&lt;p&gt;Start with read-only tools. Add write tools only after tool selection, parameter validation, tracing, and error handling are stable. For high-risk actions, require human approval first and automate gradually based on operational data.&lt;/p&gt;

&lt;p&gt;Treat prompts as code: version them, test them, review them, and make them rollback-friendly. Build evaluation datasets that cover normal cases, edge cases, malicious inputs, tool failures, memory behavior, latency, and cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  13. Final takeaway
&lt;/h3&gt;

&lt;p&gt;LangChain turns model capability into engineering capability. The right goal is not to build one-off chatbots, but to build a reusable agent engineering foundation: standard model access, a tool registry, managed memory, reusable middleware, and end-to-end observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;LangChain overview: &lt;a href="https://docs.langchain.com/oss/python/langchain/overview" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/overview&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain agents: &lt;a href="https://docs.langchain.com/oss/python/langchain/agents" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain tools: &lt;a href="https://docs.langchain.com/oss/python/langchain/tools" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/tools&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain short-term memory: &lt;a href="https://docs.langchain.com/oss/python/langchain/short-term-memory" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/short-term-memory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain long-term memory: &lt;a href="https://docs.langchain.com/oss/python/langchain/long-term-memory" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/long-term-memory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LangChain middleware: &lt;a href="https://docs.langchain.com/oss/python/langchain/middleware/overview" rel="noopener noreferrer"&gt;https://docs.langchain.com/oss/python/langchain/middleware/overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>LangChain-Core-Components-Guide</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Fri, 12 Jun 2026 15:52:07 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/langchain-core-components-guide-11jg</link>
      <guid>https://dev.to/leo_han_02060526/langchain-core-components-guide-11jg</guid>
      <description>&lt;h1&gt;
  
  
  LangChain Architect's Guide: Building LLM Applications from First Principles
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Author's Note&lt;/strong&gt;: This guide is based on a 9-episode LangChain tutorial series (~5 hours total). Every slide, code demo, and architecture diagram from the videos has been analyzed frame by frame, validated against the latest LangChain API, and rewritten as a comprehensive technical reference.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction: Why LangChain Matters&lt;/li&gt;
&lt;li&gt;Episode 1: LangChain Overview &amp;amp; the LLM Landscape&lt;/li&gt;
&lt;li&gt;Episode 2: Hello World &amp;amp; ConversationChain&lt;/li&gt;
&lt;li&gt;Episode 3: Model I/O — Prompt Engineering at Scale&lt;/li&gt;
&lt;li&gt;Episode 4: Data Connection — Teaching LLMs to Read Your Data&lt;/li&gt;
&lt;li&gt;Episode 5: Chains — The Art of Orchestration&lt;/li&gt;
&lt;li&gt;Episode 6: Agents — Autonomous LLM Reasoning&lt;/li&gt;
&lt;li&gt;Episode 7: Hands-on PDF Q&amp;amp;A System&lt;/li&gt;
&lt;li&gt;Episode 8: Hands-on Advanced Search Agent&lt;/li&gt;
&lt;li&gt;Episode 9: Retrospective &amp;amp; Best Practices&lt;/li&gt;
&lt;li&gt;Appendix: API Migration Guide &amp;amp; Troubleshooting&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Introduction: Why LangChain Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 The LLM Revolution and Its Bottlenecks
&lt;/h3&gt;

&lt;p&gt;In November 2022, OpenAI released the GPT-3.5 API (text-davinci-003), and everything changed. Within months, GPT-4 arrived with its MoE (Mixture of Experts) architecture, Meta open-sourced LLaMA, and Zhipu AI launched ChatGLM. LLMs were no longer just research papers — they were programmable infrastructure.&lt;/p&gt;

&lt;p&gt;But building applications directly on raw API calls hits four walls immediately:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 1: Context Constraints&lt;/strong&gt;. GPT-3.5 maxes out at 4,096 tokens. A 20-page PDF simply doesn't fit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 2: Capability Boundaries&lt;/strong&gt;. An LLM is a text predictor, not an agent. It can't search the web, execute code, read files, or call external APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 3: Amnesia by Design&lt;/strong&gt;. Every API call is a blank slate. State management must be built from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wall 4: Prompt Sprawl&lt;/strong&gt;. Prompts get scattered across dozens of files as raw strings. There's no templating, versioning, or testing.&lt;/p&gt;

&lt;p&gt;LangChain was built specifically to tear down these four walls.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.2 What LangChain Actually Is
&lt;/h3&gt;

&lt;p&gt;LangChain is not a new LLM. It's an &lt;strong&gt;orchestration framework&lt;/strong&gt; that provides standardized abstractions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input → [Prompt Template] → [LLM Call] → [Output Parser] → Result
                      ↑                ↑
                  [Memory]        [Tools / APIs]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The six-layer architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Problem It Solves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;td&gt;Model I/O&lt;/td&gt;
&lt;td&gt;Unified interface across LLM providers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2&lt;/td&gt;
&lt;td&gt;Data Connection&lt;/td&gt;
&lt;td&gt;Reading external documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3&lt;/td&gt;
&lt;td&gt;Chains&lt;/td&gt;
&lt;td&gt;Composing multiple LLM calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L4&lt;/td&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Retaining conversation state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L5&lt;/td&gt;
&lt;td&gt;Agents&lt;/td&gt;
&lt;td&gt;LLM autonomously decides which tools to use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L6&lt;/td&gt;
&lt;td&gt;Callbacks&lt;/td&gt;
&lt;td&gt;Monitoring, logging, debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffz5pe2pr3jzbkwd6yw19.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffz5pe2pr3jzbkwd6yw19.jpg" alt="LangChain Architecture" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1.3 The LLM Landscape
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;MoE, 8×220B experts&lt;/td&gt;
&lt;td&gt;Complex reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-3.5&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;175B Dense&lt;/td&gt;
&lt;td&gt;Price-performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLaMA 2&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;7B/13B/70B open-source&lt;/td&gt;
&lt;td&gt;Local deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ChatGLM&lt;/td&gt;
&lt;td&gt;Zhipu AI&lt;/td&gt;
&lt;td&gt;Chinese-English bilingual&lt;/td&gt;
&lt;td&gt;Chinese scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What is MoE?&lt;/strong&gt; GPT-4 is a collection of "expert" sub-models. For each inference, only a subset activates — like a dispatch system routing each question to the best-qualified specialists.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Episode 1: LangChain Overview &amp;amp; the LLM Landscape
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 1.mp4 (~30 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2.1 Key Questions
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What are LLMs?&lt;/strong&gt; — From GPT-3 (June 2020) through GPT-3.5 API (November 2022), to GPT-4 and open-source.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What is LangChain?&lt;/strong&gt; — A Python framework for composing LLM calls like building blocks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why use LangChain?&lt;/strong&gt; — Raw API calls work for demos; products require engineering.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2.2 The Raw API Problem
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createCompletion&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-davinci-003&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Who are you?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Missing: prompt management, context injection, structured output, reliability, observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 LangChain's Answer
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;LangChain is a framework for developing applications powered by language models.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It provides standardized abstractions above the LLM layer so you focus on business logic, not plumbing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk87v2hndd7rnxpcow770.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk87v2hndd7rnxpcow770.jpg" alt="LLM Ecosystem Overview" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Episode 2: Hello World &amp;amp; ConversationChain
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 2.mp4 (~48 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3.1 Environment Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain-openai langchain-community python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 First LangChain Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.deepseek.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain LangChain in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight&lt;/strong&gt;: &lt;code&gt;base_url&lt;/code&gt; enables hot-swappable model infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Temperature: The Creativity Knob
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;temperature = 0.0  →  Deterministic: math, code, factual Q&amp;amp;A
temperature = 0.7  →  Balanced: conversation, summarization
temperature = 1.0  →  Creative: storytelling, brainstorming
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood: temperature modulates the probability distribution over vocabulary.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.4 Jupyter Notebook for LLM Development
&lt;/h3&gt;

&lt;p&gt;The Cell mechanism is uniquely suited to LLM development — iteratively build prompts, observe outputs, tune parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw45cuqbhwmt859jmlpn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhw45cuqbhwmt859jmlpn.jpg" alt="Jupyter Example" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.5 ConversationChain: Giving LLMs Memory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationBufferMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationChain&lt;/span&gt;

&lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationChain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ConversationBufferMemory&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Alice, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m 25&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s my name and age?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# AI: Your name is Alice and you're 25 years old.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.6 How Prompts Pass Information to LLMs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System Message&lt;/strong&gt;: Sets behavioral boundaries ("You are a professional Python developer")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human Message&lt;/strong&gt;: User's direct input&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Message&lt;/strong&gt;: LLM's response — appended as history in multi-turn conversations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Episode 3: Model I/O — Prompt Engineering at Scale
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 3.mp4 (~31 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  4.1 The Anti-Pattern: Raw String Concatenation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Anti-pattern
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Translate: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 PromptTemplate
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;

&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a {role}. Translate to {target_lang}:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{text}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prompt_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;translator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;English&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Few-Shot Prompting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FewShotPromptTemplate&lt;/span&gt;

&lt;span class="n"&gt;examples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;happy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Positive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;few_shot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FewShotPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;example_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Input: {input}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sentiment: {output}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify sentiment:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Input: {input}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sentiment:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_variables&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Few-Shot vs Fine-Tuning&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Few-Shot&lt;/th&gt;
&lt;th&gt;Fine-Tuning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Zero training cost&lt;/td&gt;
&lt;td&gt;Training data + GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;td&gt;Change instantly&lt;/td&gt;
&lt;td&gt;Requires retraining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effectiveness&lt;/td&gt;
&lt;td&gt;Format control&lt;/td&gt;
&lt;td&gt;Domain knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb&lt;/strong&gt;: Start with Few-Shot, fine-tune only when stable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1p8cggcen988nel6cy3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1p8cggcen988nel6cy3.jpg" alt="Prompt Templates" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.4 Example Selector
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.example_selectors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SemanticSimilarityExampleSelector&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;

&lt;span class="n"&gt;selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SemanticSimilarityExampleSelector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_examples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;all_examples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;vectorstore_cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How it works: 1) Embed all examples → 2) Embed user input → 3) Cosine similarity → 4) Inject top-K&lt;/p&gt;

&lt;h3&gt;
  
  
  4.5 Output Parsers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CommaSeparatedListOutputParser&lt;/strong&gt;: CSV-style output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StructuredOutputParser&lt;/strong&gt;: JSON Schema compliance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PydanticOutputParser&lt;/strong&gt;: Direct Pydantic model parsing (most powerful)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Episode 4: Data Connection — Teaching LLMs to Read Your Data
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 4.mp4 (~35 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is LangChain's most important module — complete RAG infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 The Core Problem
&lt;/h3&gt;

&lt;p&gt;LLMs have a training cutoff. Your proprietary documents are invisible to them. RAG bridges this gap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: Load → Step 2: Split → Step 3: Embed → Step 4: Store
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Document Loaders
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebBaseLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;YoutubeLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;UnstructuredPowerPointLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CSVLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# pages[0].page_content → text
# pages[0].metadata → {"source": "...", "page": 1}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every loader returns &lt;code&gt;Document&lt;/code&gt; with &lt;code&gt;page_content&lt;/code&gt; + &lt;code&gt;metadata&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Text Splitters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_text_splitters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;separators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;splits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why overlap? Prevents cutting sentences at chunk boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Word Embeddings: The Math of Meaning
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"cat" → [0.12, -0.34, 0.56, ...]
"dog" → [0.14, -0.31, 0.58, ...]  ← close to "cat"
"car" → [-0.78, 0.45, -0.12, ...]  ← far from both
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cosine similarity: &lt;code&gt;cos(θ) = (A·B) / (|A|×|B|)&lt;/code&gt; — range [-1, 1]&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;

&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is RAG?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model guide&lt;/strong&gt;: &lt;code&gt;text-embedding-3-small&lt;/code&gt; (1536d, best value), &lt;code&gt;text-embedding-3-large&lt;/code&gt; (3072d)&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 Vector Stores
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;

&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MoE architecture pros and cons?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;DB guide&lt;/strong&gt;: FAISS (dev), Chroma (small projects), Pinecone (production), Weaviate/Qdrant (enterprise)&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Episode 5: Chains — The Art of Orchestration
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 5.mp4 (~25 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  6.1 What Is a Chain?
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Without Chain:
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# With LCEL:
&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6.2 Chain Types
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LLMChain&lt;/strong&gt;: Atomic unit — Prompt + LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RouterChain&lt;/strong&gt;: Auto-dispatches to specialized handlers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input → Router → Math? → MathChain
               → Code? → CodeChain  
               → General? → GeneralChain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SequentialChain&lt;/strong&gt;: Pipeline processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate Outline → Expand → Polish → Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TransformationChain&lt;/strong&gt;: Post-process output (clean, translate, filter).&lt;/p&gt;

&lt;h3&gt;
  
  
  6.3 Document Chains (RAG Core)
&lt;/h3&gt;

&lt;p&gt;Four strategies for feeding retrieved chunks to LLM:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stuff&lt;/strong&gt;: All chunks in one prompt. Simple, context-limited.&lt;br&gt;
&lt;strong&gt;Map-Reduce&lt;/strong&gt;: Process each chunk independently, then aggregate. Parallel, scalable.&lt;br&gt;
&lt;strong&gt;Refine&lt;/strong&gt;: Iteratively improve answer with each chunk. High quality, sequential.&lt;br&gt;
&lt;strong&gt;Map-Rerank&lt;/strong&gt;: Score each chunk's answer, pick best. For relevance ranking.&lt;/p&gt;


&lt;h2&gt;
  
  
  7. Episode 6: Agents — Autonomous LLM Reasoning
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 6.mp4 (~25 min)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  7.1 Chains vs Agents
&lt;/h3&gt;

&lt;p&gt;Chain = passive (defined workflow). Agent = active (chooses tools autonomously).&lt;/p&gt;
&lt;h3&gt;
  
  
  7.2 ReAct Pattern
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q: Who is Leo DiCaprio's girlfriend? Her age ^ 0.43?

Thought: Find girlfriend first
Action: Search("Leo DiCaprio girlfriend")
Observation: Vittoria Ceretti

Thought: Need her age
Action: Search("Vittoria Ceretti age")  
Observation: 26 years old

Thought: Calculate 26^0.43
Action: Calculator("26^0.43")
Observation: ~4.06

Final Answer: Vittoria Ceretti, 26. 26^0.43 ≈ 4.06
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  7.3 Agent Implementation
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentType&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_tools&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;serpapi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm-math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZERO_SHOT_REACT_DESCRIPTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who is Leo DiCaprio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s girlfriend? Her age ^ 0.43?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  7.4 Agent Types
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Zero-shot ReAct&lt;/td&gt;
&lt;td&gt;Decide on the fly&lt;/td&gt;
&lt;td&gt;Simple tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured Chat&lt;/td&gt;
&lt;td&gt;Multi-parameter tools&lt;/td&gt;
&lt;td&gt;Complex tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Functions&lt;/td&gt;
&lt;td&gt;Function Calling API&lt;/td&gt;
&lt;td&gt;GPT models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan-and-Execute&lt;/td&gt;
&lt;td&gt;Plan then execute&lt;/td&gt;
&lt;td&gt;Multi-step tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  7.5 Tuning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;max_iterations&lt;/strong&gt;: Prevent infinite loops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;handle_parsing_errors&lt;/strong&gt;: Retry on malformed output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;early_stopping_method&lt;/strong&gt;: "force" or "generate" best guess&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  8. Episode 7: Hands-on PDF Q&amp;amp;A System
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 7.mp4 (~38 min)&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PyPDFLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_text_splitters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;

&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;splits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_chain_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;chain_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stuff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="n"&gt;return_source_documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the key findings?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;chain_type&lt;/strong&gt;: &lt;code&gt;stuff&lt;/code&gt; (&amp;lt;4 chunks), &lt;code&gt;map_reduce&lt;/code&gt; (many), &lt;code&gt;refine&lt;/code&gt; (sequential), &lt;code&gt;map_rerank&lt;/code&gt; (score).&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Episode 8: Hands-on Advanced Search Agent
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 8.mp4 (~53 min)&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentType&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_stock_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current stock price. Input: ticker like AAPL, TSLA.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AAPL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;189.30&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TSLA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;242.84&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Symbol &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;symbol&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;get_stock_price&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZERO_SHOT_REACT_DESCRIPTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Apple&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s stock price?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key: tool's docstring IS what the Agent reads to decide when to use it.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Episode 9: Retrospective &amp;amp; Best Practices
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Source&lt;/strong&gt;: 9.mp4 (~20 min)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Production Checklist
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;: Never expose keys in prompts. Sandbox agent execution.&lt;br&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Cache embeddings. Persistent vector stores (Chroma/Pinecone).&lt;br&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Cheap models first. Set max_iterations. Cache deterministic responses.&lt;br&gt;
&lt;strong&gt;Observability&lt;/strong&gt;: LangSmith tracing. Log token consumption.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Migration (2 Years)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;New&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;langchain.llms.OpenAI&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;langchain_openai.ChatOpenAI&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;llm.predict("text")&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;llm.invoke([HumanMessage("text")])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Chain.run(input)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Chain.invoke({"key": value})&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  11. Appendix: Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ModuleNotFoundError: langchain.llms&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;langchain-openai&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;jupyter-lab&lt;/code&gt; not found&lt;/td&gt;
&lt;td&gt;Add Scripts to PATH&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek 401&lt;/td&gt;
&lt;td&gt;Check &lt;code&gt;.env&lt;/code&gt; API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent infinite loop&lt;/td&gt;
&lt;td&gt;Improve tool docstrings, set max_iterations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAISS OOM&lt;/td&gt;
&lt;td&gt;Switch to Chroma or Pinecone&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Learning Path
&lt;/h3&gt;

&lt;p&gt;Week 1: Hello World → Week 2: Prompts → Week 3: RAG → Week 4: Agent + Tools → Ongoing: Docs&lt;/p&gt;




&lt;h2&gt;
  
  
  Epilogue
&lt;/h2&gt;

&lt;p&gt;LangChain's core philosophy — &lt;strong&gt;modular, composable, engineered&lt;/strong&gt; — endures. Package names change; architectural patterns don't.&lt;/p&gt;

&lt;p&gt;The most valuable thing LangChain offers isn't its code — it's the paradigm of composing LLM applications like building blocks.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: Code adapted for LangChain latest API (2025-2026). Images from original tutorial screenshots for educational reference only.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>why-we-dropped-langchain</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:09:07 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/why-we-dropped-langchain-5532</link>
      <guid>https://dev.to/leo_han_02060526/why-we-dropped-langchain-5532</guid>
      <description>&lt;h1&gt;
  
  
  Why We Dropped LangChain: When Abstractions Do More Harm Than Good
&lt;/h1&gt;

&lt;h2&gt;
  
  
  A 12-Month Lesson Learned
&lt;/h2&gt;

&lt;p&gt;In early 2023, we put LangChain into production. In 2024, we removed it entirely.&lt;/p&gt;

&lt;p&gt;LangChain seemed like the best choice for building LLM-powered applications in 2023. It had an impressive list of components and tools, its popularity was soaring, and it promised to "enable developers to go from an idea to working code in an afternoon." But as our project progressed, the cracks began to show.&lt;/p&gt;

&lt;p&gt;LangChain's &lt;strong&gt;inflexibility&lt;/strong&gt; gradually surfaced: we found ourselves constantly diving into LangChain internals to modify lower-level behavior. And because LangChain intentionally abstracts away those internals, doing so was extremely painful. Whenever we needed to do something the framework didn't natively support, we had to "translate" our requirements into LangChain-appropriate solutions — instead of just writing code.&lt;/p&gt;

&lt;p&gt;This post shares the real reasons we abandoned LangChain, and why replacing its rigid high-level abstractions with modular building blocks simplified our codebase, made our team happier, and made us more productive.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Problem: The Perils of Being an Early Framework
&lt;/h2&gt;

&lt;p&gt;LLMs are a rapidly changing field, with new concepts and ideas emerging weekly. When a framework like LangChain is built around multiple emerging technologies, designing abstractions that will stand the test of time is nearly impossible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crafting well-designed abstractions is hard&lt;/strong&gt; — even when the requirements are well-understood and stable. But when you're modeling components in such a state of flux, by the time you finish designing the abstraction, the underlying technology has already changed.&lt;/p&gt;

&lt;p&gt;This isn't the LangChain team's fault. Anyone attempting to build such a framework at that point in time wouldn't have done any better. Everyone was doing their best.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 1: Simple Tasks Become Complicated
&lt;/h2&gt;

&lt;p&gt;Consider the simplest possible task: a translation app. Using the native OpenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your_api_key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;language&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Italian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert translator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Translate the following from English into &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean, direct, no hidden logic. Any Python developer can understand it at a glance.&lt;/p&gt;

&lt;p&gt;Now the same task with LangChain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert translator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Translate the following from English into {language}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{text}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now need to understand &lt;code&gt;ChatPromptTemplate&lt;/code&gt;, &lt;code&gt;StrOutputParser&lt;/code&gt;, the pipe operator &lt;code&gt;|&lt;/code&gt;, and the &lt;code&gt;invoke&lt;/code&gt; method — all LangChain-specific concepts. They don't make your code better; they just make your code more LangChain.&lt;/p&gt;

&lt;p&gt;The problem isn't writing a few extra lines. The problem is that &lt;strong&gt;every abstraction you introduce adds a layer of cognitive overhead and debugging difficulty&lt;/strong&gt;. When something breaks, you're not debugging your business logic — you're debugging LangChain's framework code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem 2: The &lt;code&gt;http.client&lt;/code&gt; vs. &lt;code&gt;requests&lt;/code&gt; Analogy
&lt;/h2&gt;

&lt;p&gt;Imagine you have a choice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A&lt;/strong&gt;: Use &lt;code&gt;http.client&lt;/code&gt; to make a request&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;http.client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HTTPSConnection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getresponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B&lt;/strong&gt;: Use &lt;code&gt;requests&lt;/code&gt; to make a request&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is better? Obviously B. &lt;code&gt;requests&lt;/code&gt; isn't "too much abstraction" — it's the &lt;strong&gt;right level of abstraction&lt;/strong&gt;. It encapsulates the genuine complexity (connection management, encoding) without hiding what you actually care about (URL, response data).&lt;/p&gt;

&lt;p&gt;LangChain's problem is that it's often neither A nor B. It neither simplifies the truly hard parts (complex Agent orchestration) nor leaves the simple parts simple.&lt;/p&gt;

&lt;p&gt;A Reddit comment captured it perfectly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Code quality isn't great and structure is pretty iffy. Really hate the piping language structure. Docs are way outdated, deprecation warnings implemented poorly. And when you need to dig under the surface to fix something, you see the ugly. But it gets the job done. I can see what they want to do, but it's bloated very quickly, probably because of its popularity."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Problem 3: Agents Become Black Boxes
&lt;/h2&gt;

&lt;p&gt;When we wanted to move from a single sequential Agent architecture to something more complex, LangChain became our biggest obstacle.&lt;/p&gt;

&lt;p&gt;We needed to externally observe the Agent's state, dynamically control available tools, and flexibly orchestrate interactions between multiple Agents. But LangChain's Agent abstractions encapsulate all of this behind an opaque surface — it provides no method for externally observing an Agent's state, forcing us to reduce the scope of our implementation to fit into the limited functionality available to LangChain Agents.&lt;/p&gt;

&lt;p&gt;In one instance, we needed to &lt;strong&gt;dynamically change the availability of tools&lt;/strong&gt; our Agents could access, based on business logic. In native code, this is an &lt;code&gt;if&lt;/code&gt; statement and an &lt;code&gt;append&lt;/code&gt;/&lt;code&gt;remove&lt;/code&gt; on a list. In LangChain, you're expected to declare tools upfront during the framework's initialization flow, and dynamic modification requires working around layers of encapsulation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once we removed LangChain, we no longer had to translate our requirements into LangChain-appropriate solutions. We could just code.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What LangChain's Architecture Diagram Really Shows
&lt;/h2&gt;

&lt;p&gt;LangChain's official architecture reveals its ambition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        LLMs and Prompts
              |
           CHAINS
              |
          LANGCHAIN
         /          \
  MEMORY              DOCUMENT LOADERS
(Vector DBs)            AND UTILS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: &lt;strong&gt;every single module is in constant flux&lt;/strong&gt;. LLM interfaces change. Prompt best practices evolve. Chain orchestration patterns shift. Memory implementations keep moving. When your framework tries to abstract all of these rapidly changing components at once, the only stable thing is the instability itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Do You Really Need a Framework for Building AI Applications?
&lt;/h2&gt;

&lt;p&gt;LangChain's long list of components gives the impression that building LLM-powered applications is complicated and requires a framework. But here's the reality:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM Calls&lt;/strong&gt;: The OpenAI / Anthropic SDKs are already clean enough&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Management&lt;/strong&gt;: Python f-strings or a Jinja2 template will do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chains / Orchestration&lt;/strong&gt;: Pure Python functions and loops, more readable than any DSL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: A dictionary or a database table, under your full control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Stores&lt;/strong&gt;: Chroma, Pinecone, Qdrant all have clean native APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Loaders&lt;/strong&gt;: Mature, independent libraries exist for PDF, web parsing, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;LangChain adds one more abstraction layer on top of all of these. And the value of that layer, in most scenarios, falls far short of the complexity it introduces.&lt;/p&gt;




&lt;h2&gt;
  
  
  Our Alternative: Modular Building Blocks
&lt;/h2&gt;

&lt;p&gt;After dropping LangChain, our tech stack became:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI / Anthropic SDK&lt;/strong&gt; — LLM calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chroma / Qdrant&lt;/strong&gt; — Vector storage (using native APIs directly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple custom orchestration&lt;/strong&gt; — Python functions + type annotations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard Python logging and monitoring&lt;/strong&gt; — no framework-specific callback system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core principle: &lt;strong&gt;every component is replaceable, every abstraction is your own, and there are no black boxes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This approach may require a few dozen more lines of boilerplate than LangChain, but the trade-off is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fully controllable execution flow&lt;/li&gt;
&lt;li&gt;Zero framework debugging overhead&lt;/li&gt;
&lt;li&gt;Team members don't need to learn yet another DSL&lt;/li&gt;
&lt;li&gt;No lock-in to a framework's version upgrades&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When Should You Use LangChain?
&lt;/h2&gt;

&lt;p&gt;To be fair, LangChain still has value in certain scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid prototyping&lt;/strong&gt;: you want a working RAG demo in 30 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teaching / learning&lt;/strong&gt;: understanding RAG and related concepts through a structured approach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardized simple workflows&lt;/strong&gt;: your requirements happen to perfectly match its Chain pattern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But if you're building a &lt;strong&gt;production-grade system&lt;/strong&gt;, LangChain is more likely to become technical debt than an accelerator.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;LangChain did something few dared to do: attempt to provide a unified framework during the most chaotic period of the LLM ecosystem. That courage deserves respect.&lt;/p&gt;

&lt;p&gt;But the experience of 2024–2026 has shown: &lt;strong&gt;for production AI Agent systems, simple, direct code beats complex framework abstractions.&lt;/strong&gt; LLMs themselves are already complex enough — you don't need a framework adding another layer of complexity on top.&lt;/p&gt;

&lt;p&gt;The biggest feeling our team had after dropping LangChain wasn't "we lost features" — it was &lt;strong&gt;"we're finally free."&lt;/strong&gt; We can write the code we want in the most direct way, instead of figuring out how to make the framework allow us to write it.&lt;/p&gt;

&lt;p&gt;If you're starting a new LLM project, my advice is: try going framework-free first. Write code with the native SDKs for a few weeks. Then decide whether you truly need those abstractions. The answer will likely be no.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is adapted from the video "Why We Dropped LangChain," drawing on a team's real experience of using LangChain in production for 12 months before removing it, analyzing the cost of framework abstractions and the advantages of a modular building-block approach.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>rag-explained-how-it-works</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:06:58 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/rag-explained-how-it-works-36g7</link>
      <guid>https://dev.to/leo_han_02060526/rag-explained-how-it-works-36g7</guid>
      <description>&lt;h1&gt;
  
  
  RAG Explained: How Retrieval-Augmented Generation Actually Works
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What Is RAG?
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) is one of the most important architectural patterns in LLM applications from 2024–2025. The core idea is simple: &lt;strong&gt;before the LLM generates an answer, retrieve relevant information from an external knowledge base, inject the retrieval results into the context, and then have the model generate an answer based on that information.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why is RAG needed? Large language models have three inherent limitations: knowledge cutoff dates (the temporal boundary of training data), hallucination (fabricating non-existent facts), and insufficient domain expertise (lacking enterprise-internal or specialized data). RAG circumvents the model's internal knowledge constraints by adopting a "retrieve first, generate later" approach, allowing the LLM to reference the latest and most accurate private data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core RAG Workflow
&lt;/h2&gt;

&lt;p&gt;A standard RAG system follows this pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document Library → Chunking → Embedding → Vector Database Storage
                                                    ↓
User Query → Query Embedding → Similarity Search → Retrieve Relevant Chunks
                                            ↓
                        LLM Generates Answer (based on retrieved results + original query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pipeline can be decomposed into two phases: the &lt;strong&gt;offline indexing phase&lt;/strong&gt; (document processing and storage) and the &lt;strong&gt;online query phase&lt;/strong&gt; (real-time retrieval and generation).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Offline Indexing Phase
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Document Parsing &amp;amp; Chunking
&lt;/h3&gt;

&lt;p&gt;Raw documents (PDFs, web pages, database records, etc.) are typically too long for direct vector retrieval. They need to be split into appropriately sized chunks.&lt;/p&gt;

&lt;p&gt;Chunking strategy directly impacts retrieval quality. Common approaches include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fixed-size chunking&lt;/strong&gt;: split by token count or character count (e.g., 512 tokens per chunk)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic chunking&lt;/strong&gt;: split along natural boundaries like paragraphs and sections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recursive chunking&lt;/strong&gt;: start with coarse separators (chapter headers), then progressively refine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overlapping chunking&lt;/strong&gt;: maintain overlap between adjacent chunks (e.g., 10–20%) to prevent key information from being severed at boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chunk size involves a trade-off: too small and the semantics are incomplete; too large and retrieval precision degrades.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Embedding
&lt;/h3&gt;

&lt;p&gt;After chunking, an embedding model converts each text chunk into a fixed-dimensional vector. These vectors are points in high-dimensional space — semantically similar texts are closer together in vector space.&lt;/p&gt;

&lt;p&gt;Choosing the right embedding model is critical. Currently mainstream models include:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Max Tokens&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;3072&lt;/td&gt;
&lt;td&gt;8191&lt;/td&gt;
&lt;td&gt;OpenAI recommended, excellent value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-small&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;8191&lt;/td&gt;
&lt;td&gt;Lightweight, fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;multilingual-e5-large&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;Strong multilingual support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GTE-Qwen2-7B-instruct&lt;/td&gt;
&lt;td&gt;3584&lt;/td&gt;
&lt;td&gt;32768&lt;/td&gt;
&lt;td&gt;Open-source SOTA, long text support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BGE-M3&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;8192&lt;/td&gt;
&lt;td&gt;Multilingual + sparse-dense hybrid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, a user query like "How tall is the Empire State Building?" gets converted through embedding into a dense vector like &lt;code&gt;[1.0, 2.5, 3.7, 5.8, 2.8]&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Vector Database Storage
&lt;/h3&gt;

&lt;p&gt;The embedded document chunks are stored in a vector database (Pinecone, Weaviate, Milvus, Qdrant, Chroma, etc.). The core capability of a vector database is &lt;strong&gt;Approximate Nearest Neighbor (ANN) search&lt;/strong&gt;, which can find the K most similar results from millions or even billions of vectors in milliseconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Online Query Phase
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Query Embedding
&lt;/h3&gt;

&lt;p&gt;The user's question is first converted into a vector using the same embedding model. &lt;strong&gt;Queries and documents must use the same embedding model&lt;/strong&gt; — otherwise, they lie in different vector spaces and similarity calculations become meaningless.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Similarity Search
&lt;/h3&gt;

&lt;p&gt;The query vector performs a similarity search against the vector database. Common similarity measures include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cosine Similarity&lt;/strong&gt;: measures how close two vectors are in direction; range [-1, 1], unaffected by vector magnitude&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Euclidean Distance&lt;/strong&gt;: the straight-line distance in space; smaller values indicate greater similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dot Product&lt;/strong&gt;: suitable for normalized vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For instance, suppose document chunk A has the vector &lt;code&gt;[1.3, 1.5, 3.3, 5.7, 4.9]&lt;/code&gt; and the query vector is &lt;code&gt;[1.0, 2.5, 3.7, 5.8, 2.8]&lt;/code&gt;. Their cosine similarity is approximately &lt;strong&gt;0.47&lt;/strong&gt;. Meanwhile, document chunk B with vector &lt;code&gt;[4.8, 3.7, 1.5, 5.2, 6.0]&lt;/code&gt; has a similarity of about &lt;strong&gt;0.51&lt;/strong&gt; to the same query — indicating that B is semantically closer and should rank higher.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Re-ranking
&lt;/h3&gt;

&lt;p&gt;The results from the initial retrieval (typically top 10–50) are not always optimal. The re-ranking stage uses a more precise (but slower) model to re-sort the candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-Encoders&lt;/strong&gt; are the standard method for re-ranking. Unlike Bi-Encoders that encode the query and document independently, Cross-Encoders concatenate the query and document together before feeding them into the model, capturing finer-grained interaction patterns between them. The ranking accuracy is significantly higher, though at greater computational cost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bi-Encoder (initial retrieval): fast but lower precision
# Query and document are encoded independently
&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;doc_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cosine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc_vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cross-Encoder (re-ranking): slow but high precision
# Query and document are concatenated and jointly encoded
&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cross_encoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, a &lt;strong&gt;two-stage retrieval&lt;/strong&gt; approach is standard: first use a Bi-Encoder to quickly recall the top-N from massive candidates, then use a Cross-Encoder to precisely re-rank the top-N and select the top-K to feed into the LLM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evaluating RAG Systems
&lt;/h2&gt;

&lt;p&gt;RAG system quality can be measured across multiple dimensions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval quality metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recall@K: whether the correct answer appears in the top-K results&lt;/li&gt;
&lt;li&gt;MRR (Mean Reciprocal Rank): the average reciprocal rank of the first correct answer&lt;/li&gt;
&lt;li&gt;NDCG (Normalized Discounted Cumulative Gain): a weighted score accounting for ranking position&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Generation quality metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faithfulness: whether the generated content faithfully reflects the retrieved context&lt;/li&gt;
&lt;li&gt;Answer Relevance: whether the answer addresses the question&lt;/li&gt;
&lt;li&gt;Context Relevance: whether the retrieved content is relevant to the question&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The RAGAS (RAG Assessment) framework is widely used for automated evaluation, providing a systematic scoring system atop standard benchmarks like MTEB (Massive Text Embedding Benchmark).&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced RAG Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Query Rewriting
&lt;/h3&gt;

&lt;p&gt;Raw user questions are often imprecise. Rewriting queries before retrieval — expanding synonyms, supplementing context, decomposing complex questions — can significantly boost recall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Search
&lt;/h3&gt;

&lt;p&gt;Fuse results from &lt;strong&gt;dense retrieval&lt;/strong&gt; (vector similarity) and &lt;strong&gt;sparse retrieval&lt;/strong&gt; (keyword matching, e.g., BM25). Dense retrieval excels at semantic matching; sparse retrieval excels at exact matching. The two are complementary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-hop Retrieval
&lt;/h3&gt;

&lt;p&gt;For questions requiring multi-step reasoning, the first round of retrieval results can generate new queries for a second (or more) round of retrieval, progressively approaching the answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-RAG
&lt;/h3&gt;

&lt;p&gt;Allow the LLM to self-assess during generation whether retrieval is needed, whether the retrieved results are relevant, and whether the generated content is grounded — achieving "on-demand retrieval" rather than indiscriminate retrieval.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through its "retrieve → augment → generate" architecture, RAG effectively addresses the three major challenges of LLMs: knowledge staleness, hallucination control, and domain adaptation. A production-grade RAG system involves multiple critical decisions: chunking strategy selection, embedding model choice, vector database configuration, similarity measure design, and re-ranking mechanism integration.&lt;/p&gt;

&lt;p&gt;As long-context models advance, some might ask: "Why not just stuff all the documents into the context window?" But practice shows that RAG's value lies not in "how much you can fit," but in &lt;strong&gt;precisely finding the most relevant pieces of information&lt;/strong&gt; — retrieval quality determines the ceiling of answer quality.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is adapted from the video "How RAG Works," covering the definition of RAG, the two-phase indexing and query pipeline, embedding and vector retrieval principles, chunking strategies, re-ranking mechanisms, and advanced patterns.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>building-ai-agents-right-way</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:06:48 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/building-ai-agents-right-way-50pa</link>
      <guid>https://dev.to/leo_han_02060526/building-ai-agents-right-way-50pa</guid>
      <description>&lt;h1&gt;
  
  
  Building AI Agents the Right Way: Lessons from Anthropic Engineering
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Why Following a Tutorial Will Make Your System Worse
&lt;/h2&gt;

&lt;p&gt;2025–2026 has seen an explosion in AI Agent adoption, yet most teams keep making the same mistakes. As the Anthropic Engineering team has learned through extensive real-world experience: &lt;strong&gt;blindly "following a tutorial to build an Agent" will only produce a worse system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This guide distills the core principles and practical methods for building effective, reliable Agent systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Principle 1: Don't Use Agents Just to Use Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stop Using Agents Just to Use Agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most important principle of all. Agents are not a silver bullet. Many teams jump straight to handing every task to an Agent, only to discover that latency skyrockets, costs spiral out of control, and system stability falls below what their original deterministic logic achieved.&lt;/p&gt;

&lt;p&gt;Agents are appropriate for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open-ended reasoning tasks (multi-step, non-deterministic paths)&lt;/li&gt;
&lt;li&gt;Tasks requiring interaction with multiple external tools&lt;/li&gt;
&lt;li&gt;Tasks where the objective is clear but the execution path needs dynamic adjustment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents are &lt;strong&gt;not appropriate&lt;/strong&gt; for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple CRUD operations — use a direct API call&lt;/li&gt;
&lt;li&gt;Deterministic data processing — use a traditional pipeline&lt;/li&gt;
&lt;li&gt;Problems solvable by a single model call — no need to wrap it in an Agent framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In one sentence: &lt;strong&gt;if an API call can do it, don't bring in an Agent.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Principle 2: Plan → Execute, Step by Step
&lt;/h2&gt;

&lt;p&gt;An Agent is not as simple as "throw an instruction at the model and wait for the result." A robust Agent architecture follows a &lt;strong&gt;Plan → Execute&lt;/strong&gt; two-phase model:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan phase&lt;/strong&gt;: The Agent first thinks holistically and devises an execution plan. During this phase, it calls no external tools — it relies purely on reasoning to break the task into executable steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execute phase&lt;/strong&gt;: Execute step by step according to the plan, with each step having clear inputs, outputs, and validation conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1 → Step 2 → Step 3
  ↓         ↓        ↓
Observe   Observe   Final Result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This "plan first, execute later" pattern is far more stable than "think while doing," because the planning phase provides the Agent with a global perspective, preventing it from getting stuck in local optima or drifting off course.&lt;/p&gt;




&lt;h2&gt;
  
  
  Principle 3: Memory Is the Soul of an Agent
&lt;/h2&gt;

&lt;p&gt;The Agent's core challenge isn't "intelligence" — it's "memory." Memory operates across two dimensions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal Memory&lt;/strong&gt;: The Agent's context within a single execution — including the task description, completed steps, observations, and intermediate reasoning. This is essentially the LLM's context window. Managing the context well (not losing critical information, not retaining redundancy) is the foundation of Agent stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External Storage&lt;/strong&gt;: Persistent memory that spans multiple executions. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task history and result caching&lt;/li&gt;
&lt;li&gt;Lessons learned from experience&lt;/li&gt;
&lt;li&gt;User preferences and feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;External storage enables the Agent to learn from history, avoid repeating mistakes, and gradually optimize its behavior over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Principle 4: Embrace Sub-Agents
&lt;/h2&gt;

&lt;p&gt;Complex tasks should not be handled by a single monolithic Agent. The &lt;strong&gt;Sub-Agent pattern&lt;/strong&gt; is standard equipment for production-grade Agent systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Master Agent (Orchestrator)&lt;/strong&gt;: responsible for task understanding, decomposition, and scheduling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-Agents&lt;/strong&gt;: each handles a specialized sub-task (search, code generation, data analysis, format conversion, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benefits of this architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each sub-agent has more focused instructions, leading to fewer hallucinations&lt;/li&gt;
&lt;li&gt;Multiple sub-tasks can run in parallel&lt;/li&gt;
&lt;li&gt;Sub-agents are decoupled, making problems easier to isolate&lt;/li&gt;
&lt;li&gt;Different models can be used for different sub-agents (expensive ones for reasoning, cheap ones for formatting)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Principle 5: Avoid Premature Correction
&lt;/h2&gt;

&lt;p&gt;Errors during Agent execution are normal, but &lt;strong&gt;your error-handling strategy determines the final system quality&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The common mistake: the Agent takes one step, finds the result unsatisfactory, immediately self-corrects, revises the plan, and re-executes. This causes "oscillation" — the Agent ping-pongs between directions and never finishes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wrong pattern:
Original Bug → AI Auto-Fixed → Introduces New Bug → Fix Again → ...

Right pattern:
Original Bug → Complete Full Execution First → Evaluate Holistically → One-Time Fix
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The correct approach: &lt;strong&gt;let the Agent complete a full execution first, then make a one-time correction based on the overall result&lt;/strong&gt; — rather than "fine-tuning" at every step. In other words, give the Agent room to make mistakes, but limit how often it can correct itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Principle 6: Tool Chain Design
&lt;/h2&gt;

&lt;p&gt;More tools doesn't mean a better Agent. The key principles for tool chain design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomicity&lt;/strong&gt;: each tool does one thing, and does it well. Don't design "all-purpose" tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardized I/O&lt;/strong&gt;: all tools use a unified input/output format to reduce the Agent's cognitive load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear error returns&lt;/strong&gt;: when a tool call fails, the return must precisely describe "what went wrong" and "possible correction paths" — not a vague "Error."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least privilege&lt;/strong&gt;: each sub-agent is granted only the minimum toolset needed to complete its task.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Principle 7: Start from Mature Prompts
&lt;/h2&gt;

&lt;p&gt;Don't write an Agent's system prompt from scratch. Search for and draw on &lt;strong&gt;mature prompts&lt;/strong&gt; that have been thoroughly tested in the community, then adapt them to your specific scenario.&lt;/p&gt;

&lt;p&gt;A good Agent prompt typically includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear role definition and capability boundaries&lt;/li&gt;
&lt;li&gt;Explicit tool-calling formats with examples&lt;/li&gt;
&lt;li&gt;Error-handling strategies&lt;/li&gt;
&lt;li&gt;Termination conditions and output format requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic Engineering's practice shows that prompt engineering is far more critical in the Agent context than in ordinary chat scenarios — a well-structured prompt can reduce failure rates from 40% to below 5%.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Complete Agent System Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input
    ↓
Task Parsing &amp;amp; Classification (Orchestrator Agent)
    ↓
Plan Phase: Devise Execution Plan
    ↓
Distribute Sub-tasks to Sub-Agents
    ↓
Sub-Agent 1      Sub-Agent 2      Sub-Agent 3
(Search)         (Code)           (Analysis)
    ↓                ↓                ↓
Aggregate &amp;amp; Validate Results (Orchestrator Agent)
    ↓
(If Necessary) One-Time Correction
    ↓
Final Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building an AI Agent is not about buying a framework and tweaking a few parameters. A truly effective Agent system requires thoughtful design decisions at the architectural level. Remember these seven core principles: &lt;strong&gt;use Agents sparingly, plan before executing, manage internal and external memory, decompose complexity with sub-agents, avoid premature correction, design tool chains carefully, and start from mature prompts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most important lesson is this: the goal of an Agent is not to look "intelligent" — it's to &lt;strong&gt;complete work stably and predictably&lt;/strong&gt;. If following a tutorial verbatim produces a working solution, that task likely never needed an Agent in the first place.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is adapted from the video "Building AI Agents — Follow the Tutorial and Your System Will Only Get Worse," drawing on practical experience from the Anthropic Engineering team and covering the seven core principles of Agent construction along with the complete pipeline design.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>agents-concepts-principles-patterns</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:06:13 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/agents-concepts-principles-patterns-bfk</link>
      <guid>https://dev.to/leo_han_02060526/agents-concepts-principles-patterns-bfk</guid>
      <description>&lt;h1&gt;
  
  
  AI Agents: Concepts, Principles, and Patterns
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What Is an AI Agent?
&lt;/h2&gt;

&lt;p&gt;In 2025–2026, AI Agents have become the dominant paradigm for building applications on top of large language models. Simply put, an Agent is an AI system that can &lt;strong&gt;autonomously perceive its environment, reason about next steps, take actions, and continuously adjust based on feedback&lt;/strong&gt;. The fundamental difference from a traditional chatbot is this: an Agent doesn't just answer questions — it &lt;strong&gt;actively uses tools, executes multi-step tasks, and self-corrects when things go wrong&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If a large language model (LLM) is the "brain," then an Agent is that brain equipped with "hands and feet" — it reads and writes files, runs terminal commands, searches the web, calls APIs, and transforms the model's reasoning capabilities into real-world actions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Principle: The ReAct Paradigm
&lt;/h2&gt;

&lt;p&gt;The philosophical foundation of Agent operation comes from a landmark paper: &lt;strong&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;/strong&gt; (Yao et al.).&lt;/p&gt;

&lt;p&gt;The paper identified a critical problem: before ReAct, the LLM's &lt;strong&gt;reasoning abilities&lt;/strong&gt; (e.g., Chain-of-Thought prompting) and &lt;strong&gt;acting abilities&lt;/strong&gt; (e.g., generating action plans) were studied as two separate topics. Reasoning happened only "in the head," and acting happened only "externally," with no synergy between them.&lt;/p&gt;

&lt;p&gt;ReAct's core contribution is &lt;strong&gt;interleaving reasoning and acting in a looping fashion&lt;/strong&gt;. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning&lt;/strong&gt; helps the model induce, track, and update action plans, and handle exceptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acting&lt;/strong&gt; allows the model to interface with external resources (knowledge bases, code environments, web pages, etc.) and gather additional information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This synergy delivers significant advantages: on question-answering (HotpotQA) and fact verification (Fever), ReAct effectively overcomes the hallucination and error propagation issues of pure Chain-of-Thought reasoning by interacting with a Wikipedia API. On interactive decision-making benchmarks (ALFWorld), ReAct substantially outperforms prior approaches.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent Work Loop
&lt;/h2&gt;

&lt;p&gt;A typical Agent operates according to the following cycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task
  ↓
Thought: Analyze the current state, decide what to do next
  ↓
Action: Invoke a tool to perform an operation
  ↓
Observation: Receive the tool's output
  ↓
(Repeat until a final answer can be given)
  ↓
Final Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop may look simple, but it gives Agents tremendous adaptability. If an action doesn't produce the expected result, the Agent can adjust its strategy in the next thought — in essence, &lt;strong&gt;trading runtime reasoning for robustness&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Question-Answering Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: How tall is the Empire State Building?

Thought: I need to look up the height of the Empire State Building.
Action: get_height("Empire State Building")
Observation: Roof height 381 meters, antenna tip 443 meters

Thought: I now have the height data and can answer the user.
Final Answer: The Empire State Building has a roof height of 381 meters
and a total height of 443 meters including the antenna.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A Cooking Task Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: I need to make tomato scrambled eggs

Thought: First, let me find a recipe for tomato scrambled eggs.
Action: find_recipe(dish="tomato scrambled eggs")
Observation: Needs: 3 eggs, 2 tomatoes, salt, sugar, oil, scallions

Thought: I should check if these ingredients are in the fridge.
Action: check_fridge(item="eggs")
Observation: Eggs are in the fridge

Thought: Ingredients confirmed, ready to start cooking.
Final Answer: Key ingredients confirmed: eggs in stock.
Need to verify tomatoes...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Agent Tool System
&lt;/h2&gt;

&lt;p&gt;An Agent's capabilities are bounded by the tools it can use. From the demonstrations, a typical Agent toolset includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;read_file(file_path)&lt;/strong&gt; — read the contents of a specified file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;write_to_file(filename, content)&lt;/strong&gt; — create or overwrite a file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;run_terminal_command(command)&lt;/strong&gt; — execute a command in the terminal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools are defined in the System Prompt using &lt;strong&gt;XML format&lt;/strong&gt;, and the Agent issues structured tool calls with XML tags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;action&amp;gt;&lt;/span&gt;write_to_file("test.txt", "a\nb\nc")&lt;span class="nt"&gt;&amp;lt;/action&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Formatting conventions typically use &lt;code&gt;\n&lt;/code&gt; for newlines to ensure tool call parameters are correctly transmitted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Agent Construction Patterns
&lt;/h2&gt;

&lt;p&gt;The video demonstrated three mainstream Agent construction patterns through real-world examples:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. General-Purpose Agent Platform: Manus
&lt;/h3&gt;

&lt;p&gt;Manus is a general-purpose AI Agent capable of autonomously completing complex, multi-step research tasks. The video showed it executing the task "iPhone 15 Pro Max vs Galaxy S24 Ultra vs Pixel 8 Pro comparison report" in its entirety:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autonomously searched for specifications and performance data for all three phones&lt;/li&gt;
&lt;li&gt;Collected visual assets and reference images&lt;/li&gt;
&lt;li&gt;Generated a comprehensive comparison website&lt;/li&gt;
&lt;li&gt;Produced a structured report (including executive summary, detailed comparison tables, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manus's defining characteristic is &lt;strong&gt;high autonomy&lt;/strong&gt; — the user only needs to provide a task description, and the Agent independently plans, searches, organizes, and outputs, with no human intervention required.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Code Agent: Claude
&lt;/h3&gt;

&lt;p&gt;Claude as a code agent demonstrated building a Snake game using HTML/CSS/JavaScript:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Received the task: "Write a Snake game using HTML, CSS, and JS"&lt;/li&gt;
&lt;li&gt;Planned the file structure: index.html, style.css, script.js&lt;/li&gt;
&lt;li&gt;Created files one by one and wrote the code&lt;/li&gt;
&lt;li&gt;Delivered a runnable game&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude's pattern illustrates how Agents can execute deterministic coding tasks in a &lt;strong&gt;controlled environment&lt;/strong&gt; — each step has clear inputs and outputs, and errors can be detected and corrected promptly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Open-Instruction Agent: DeepSeek
&lt;/h3&gt;

&lt;p&gt;DeepSeek's demonstration focused more on &lt;strong&gt;following extremely detailed system instructions&lt;/strong&gt;. The video showed its Agent-mode prompt structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strictly defined XML tags for &lt;code&gt;&amp;lt;thought&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;action&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;observation&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;final_answer&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Specified the operating system environment (macOS 15.5) and working directory&lt;/li&gt;
&lt;li&gt;Provided complete documentation of tool definitions and calling formats&lt;/li&gt;
&lt;li&gt;Also executed the Snake game construction task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DeepSeek's case illustrates that through &lt;strong&gt;meticulous prompt engineering&lt;/strong&gt;, even a general-purpose chat model can be shaped into an agent that follows a specific Agent protocol.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Design Decisions for Building Agents
&lt;/h2&gt;

&lt;p&gt;Drawing from these cases, building an effective Agent involves several critical decisions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Prompt structure design.&lt;/strong&gt; The Agent's system prompt must precisely describe its role, available tools, output format, and reasoning steps. XML tags may seem tedious, but they provide a structured "grammar" for the model's output, reducing the probability of parsing failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tool interface granularity.&lt;/strong&gt; Tools should be sufficiently atomic (e.g., &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;write_to_file&lt;/code&gt;) so the Agent can flexibly compose them, rather than offering overly monolithic "do-everything" functions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Quality of observation feedback.&lt;/strong&gt; The Observation returned by tools is the Agent's sole basis for adjusting its next strategy. If the return information is too terse or ambiguous, the Agent's reasoning chain breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Termination conditions.&lt;/strong&gt; The Agent needs a clear stopping signal (&lt;code&gt;&amp;lt;final_answer&amp;gt;&lt;/code&gt;), or it may fall into an endless "think-act" loop. In practice, a maximum step limit is typically set as a safety net.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Error handling and recovery.&lt;/strong&gt; The core advantage of the ReAct paradigm is its ability to handle exceptions — when a tool call fails or returns unexpected results, the Agent can reassess the situation in the next &lt;code&gt;thought&lt;/code&gt; and try alternative approaches.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI Agents represent a critical leap from "language models" to "action models." The ReAct paradigm, by interleaving reasoning and acting, transforms the LLM from a system that can only "speak" into an agent that can "do."&lt;/p&gt;

&lt;p&gt;From Manus's autonomous research, to Claude's code generation, to DeepSeek's precise instruction execution, we see three distinct implementation paths for Agents — but they all share the same core philosophy: &lt;strong&gt;thinking guides action, and action feeds back into thinking&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As the tool ecosystem matures and model reasoning capabilities advance, Agents are moving from experimental prototypes to production-grade applications. Understanding the ReAct paradigm and mastering Agent construction patterns will become essential skills for engineers in the AI era.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article is adapted from the video "Agent Concepts, Principles, and Construction Patterns," covering the definition of Agents, the core ideas of the ReAct paper, the Agent work loop, tool system design, and a comparative analysis of three mainstream construction patterns.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>In what way do AI models operate</title>
      <dc:creator>Leo Han</dc:creator>
      <pubDate>Tue, 26 May 2026 11:53:27 +0000</pubDate>
      <link>https://dev.to/leo_han_02060526/in-what-way-do-ai-models-operate-3p23</link>
      <guid>https://dev.to/leo_han_02060526/in-what-way-do-ai-models-operate-3p23</guid>
      <description>&lt;p&gt;&lt;strong&gt;The Rise of AI&lt;/strong&gt;&lt;br&gt;
Over the past nearly two years since 2025, artificial intelligence has developed at a rapid pace and become a major trend. AI, intelligent agents and related technologies can be seen everywhere in daily life. It seems that AI is being applied in countless scenarios, with seemingly no rivals and capable of accomplishing almost anything. Nearly everyone has used large language models such as GPT, DeepSeek, Doubao and Yuanbao.&lt;br&gt;
Back in 2025, we believed AI programs were merely based on chatbots, designed only for tasks related to the internet and information technology. Today, however, we realize they are far more sophisticated.&lt;br&gt;
In the traditional internet industry, we have long witnessed the internet bubble economy and profit models decoupled from real industries, which rely heavily on venture capital. For this reason, many once thought AI could never be integrated into traditional trades like hairdressers, plumbers and maintenance workers, and that these occupations would remain largely unaffected by AI. Yet with the advancement of robots and smart hardware, all these possibilities have become tangible. From a short-term economic perspective, the only current limitation is the high cost of applying AI to solve simple problems.&lt;br&gt;
To truly understand AI programs at their core, it is necessary to learn some basic professional terms and figure out how they actually operate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Evolution of AI&lt;/strong&gt;&lt;br&gt;
In its early days, AI existed mainly in the form of chatbots — intelligent conversational robots. The early version of ChatGPT is a typical example. Back then, people interacted with AI by entering text on websites to get automated responses. Nowadays, modern AI agent models are able to handle tasks for a wide range of specialized scenarios, such as generating reports and creating videos.&lt;br&gt;
Accordingly, AI can be divided into the following major categories:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fea379ulhqxxs6twimh8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fea379ulhqxxs6twimh8l.png" alt=" " width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demystifying AI&lt;/strong&gt;&lt;br&gt;
We can elaborate on these categories as follows:&lt;br&gt;
ANI (Artificial Narrow Intelligence)&lt;br&gt;
Also known as narrow AI. It is designed for specific scenarios, with autonomous driving models as a typical example.&lt;br&gt;
&lt;strong&gt;Generative AI&lt;/strong&gt;&lt;br&gt;
It refers to generative artificial intelligence. Tools like ChatGPT fall into this category and can be applied to numerous scenarios.&lt;br&gt;
&lt;strong&gt;AGI (Artificial General Intelligence)&lt;/strong&gt;&lt;br&gt;
This represents the ultimate goal of AI. It can accomplish all tasks that humans are capable of, and even handle work beyond human ability.&lt;br&gt;
&lt;strong&gt;Machine Learning (ML)&lt;/strong&gt;&lt;br&gt;
Machine learning serves as a core technical pillar driving the development of AI. A key branch is Supervised Learning. Its fundamental objective is to convert given inputs into the desired outputs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwarpsc27dxlrbyu1za1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwarpsc27dxlrbyu1za1g.png" alt=" " width="282" height="46"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are simple examples. Suppose we develop an email program empowered by AI to detect spam emails. This AI-powered function is called spam filtering. If we feed audio as input and get text transcripts as output, the corresponding AI technology is speech recognition. When you input text, known as the source text, and expect another piece of text as the result, namely the target text, this is how a chatbot works — it generates text by making accurate predictions.&lt;br&gt;
LLM (Large Language Models)&lt;br&gt;
Large Language Models are trained based on machine learning, specifically supervised learning, to predict the next word in a sequence. To put it simply, you can draw parallels to the word segmentation mechanism of Elastic. As I mentioned before, here is a straightforward illustration:&lt;br&gt;
Take the input sentence: My most commonly used database is Elastic.&lt;/p&gt;

&lt;p&gt;Input   Output&lt;br&gt;
My most commonly used   database&lt;br&gt;
My most commonly used database is   Elastic&lt;/p&gt;

&lt;p&gt;When a model is trained on massive datasets, it evolves into systems like ChatGPT. Given an initial prompt, it can generate relevant responses. In fact, GPT models do far more than just predicting the next single word. They also filter and refine language to deliver more accurate replies.&lt;br&gt;
Over the past two years, many companies have been engaged in data annotation for model training. They hire staff to label large volumes of images and texts. Such work is essential to build up the recognition capabilities of models under supervised learning. Thanks to the advancement of computers and the internet, vast and diverse data resources are readily available, which has led to the remarkable leap in AI model performance in recent years.&lt;br&gt;
Lastly, let's talk about a hugely popular concept in modern AI: neural networks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueuclyf9mbqquq4yl4cv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueuclyf9mbqquq4yl4cv.png" alt=" " width="799" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As data volume keeps increasing, model performance will first improve and then level off. Traditional AI models cannot grow smarter endlessly with more data. Nevertheless, high performance relies heavily on big data. Meanwhile, the advancement of GPUs has greatly boosted large-scale computing power, providing stronger resources for model training.&lt;br&gt;
The core concepts of artificial intelligence are machine learning and supervised learning, which essentially map inputs to corresponding outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DataSet&lt;/strong&gt;&lt;br&gt;
Datasets are fundamental to AI systems. Here is a simple real-life example.&lt;br&gt;
We can create a basic dataset consisting of delivery distance (kilometers) and estimated delivery time (minutes):&lt;/p&gt;

&lt;p&gt;Delivery Distance (km)  Estimated Time (min)&lt;br&gt;
1   15&lt;br&gt;
2   20&lt;br&gt;
3   25&lt;br&gt;
4   30&lt;br&gt;
5   35&lt;/p&gt;

&lt;p&gt;In this case, the delivery distance serves as the input, and the estimated delivery time is the output. We can also add more input features, such as the number of traffic lights, to build an extended dataset:&lt;/p&gt;

&lt;p&gt;Number of Traffic Lights    Delivery Distance (km)  Estimated Time (min)&lt;br&gt;
1   1   15&lt;br&gt;
1   2   20&lt;br&gt;
2   3   30&lt;br&gt;
2   4   35&lt;br&gt;
2   5   34&lt;/p&gt;

&lt;p&gt;Likewise, we can set delivery time as the input to predict whether a delivery route can meet the time requirement.&lt;br&gt;
Apart from numerical data, there are many other application scenarios. For instance, security verification is commonly required when accessing certain websites, such as the widely discussed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r1djl4s8o6httn7pw0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r1djl4s8o6httn7pw0p.png" alt=" " width="693" height="693"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Messy Data&lt;/strong&gt;&lt;br&gt;
Many enterprises consider leveraging their existing operational data to conduct AI predictive analysis. However, there is a harsh reality.&lt;br&gt;
A great number of CEOs assume that with abundant user and production data and an AI team, they can easily carry out industrial predictive analysis and generate tangible value. This idea is actually questionable, for the reasons listed below:&lt;br&gt;
First, the data lacks continuity. When building big data systems, most internet companies store data in data warehouses via stream messages and other approaches. Such data reflects various business metrics but often has little practical value.&lt;br&gt;
Second, the data contains massive junk content. A large portion of the data is useless and unfit for AI model training.&lt;br&gt;
Third, the data is incomplete and discontinuous. Issues like missing values, unknown fields and even manually tampered business data make the data invalid for regression training.&lt;br&gt;
&lt;strong&gt;Neural Net&lt;/strong&gt;&lt;br&gt;
From the above content, we can clearly see the performance gap between traditional AI and neural networks. Do not be misled by the terminology. Artificial neural networks draw inspiration from the nervous system of the human body and are composed of numerous neurons. We can explain its characteristics with the previous delivery distance example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc11z16vjfaxriow92p7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc11z16vjfaxriow92p7y.png" alt=" " width="799" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This scenario reminds me of the regression process in M&amp;amp;V for energy consumption forecasting. It adopts linear regression to make phased predictions. To be specific, it performs discrete regression using data from Phase A to B to generate a linear regression formula, which is then applied to forecast outcomes for Phase C. This method is only used for subsequent predictive analysis, and the corresponding regression formula is displayed in charts. Its implementation requires calculations involving numerous factors. We can regard these factors as neurons, each acting as a channel between input and output with its own computing logic.&lt;br&gt;
Simply put, a neural network is a set of conditions made up of various neurons. The more neurons there are, the larger the neural network will be, leading to more accurate output results.&lt;br&gt;
&lt;strong&gt;What Machine Learning Is Good At&lt;/strong&gt;&lt;br&gt;
At first, I thought the rise of AI would have little impact on traditional industries such as manual labor and service sectors. But the reality turns out to be quite different. Currently, businesses are reluctant to spend huge costs replacing low-wage jobs with AI. However, as AGI continues to evolve, it will gradually take over more human work on a large scale.&lt;br&gt;
Meanwhile, many people hold overly high expectations for AI. We need to realize that AI is not a panacea. The distinction is easy to understand. For example, asking AI to generate reports and process data works well, because humans can give clear and complete instructions with definite operating logic. In contrast, it is unrealistic to expect AI to predict stock market trends or winning lottery numbers. Even humans cannot accomplish such tasks. Though we can feed massive complex data — including historical market trends, corporate operation reports and traffic statistics — for reference, strong randomness still cannot be eliminated.&lt;br&gt;
The same applies to popular smart vehicles and autonomous driving. Cars are equipped with cameras and radars to capture driving information, which is a typical input-output application. Yet they cannot effectively recognize human body movements. For instance, autonomous vehicles can plan routes and detect obstacles via sensors, but fail to identify passengers hailing cars by waving. Diverse body gestures make it impossible to form a fixed mapping from input to output.&lt;br&gt;
Likewise, many hospitals have introduced AI to analyze medical reports such as CT scans. Such systems work well only when scans follow standard rules: all images are placed correctly, and CT slices of the heart are positioned uniformly during supervised training. If applied to non-standard scans, for example images taken when the patient lies sideways, the recognition error will increase dramatically.&lt;br&gt;
To sum up, we can judge the applicable scenarios of machine learning by the following rules:&lt;br&gt;
Scenarios where ML performs well&lt;br&gt;
Learning relatively simple concepts&lt;br&gt;
Having a large volume of available data&lt;br&gt;
Scenarios where ML performs poorly&lt;br&gt;
Learning complex concepts with limited data&lt;br&gt;
Processing unseen and brand-new types of data&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
    </item>
  </channel>
</rss>
