<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chandrani Mukherjee</title>
    <description>The latest articles on DEV Community by Chandrani Mukherjee (@moni121189).</description>
    <link>https://dev.to/moni121189</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3324456%2Fbd5d4f89-441b-483d-91d6-fa7260065254.png</url>
      <title>DEV Community: Chandrani Mukherjee</title>
      <link>https://dev.to/moni121189</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/moni121189"/>
    <language>en</language>
    <item>
      <title>The Top Pick:🚀 Hack Gemma 4 Local: Deep Reasoning, 256K Context, &amp; Multimodal Chaos</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Sun, 17 May 2026 22:49:52 +0000</pubDate>
      <link>https://dev.to/moni121189/the-top-pick-hack-gemma-4-local-deep-reasoning-256k-context-multimodal-chaos-4cd6</link>
      <guid>https://dev.to/moni121189/the-top-pick-hack-gemma-4-local-deep-reasoning-256k-context-multimodal-chaos-4cd6</guid>
      <description>&lt;h1&gt;
  
  
  🚀 Hack Gemma 4 Local: Deep Reasoning, 256K Context, &amp;amp; Multimodal Chaos
&lt;/h1&gt;

&lt;p&gt;Welcome to the ultimate developer's guide for the &lt;strong&gt;Gemma 4 Hackathon Challenge&lt;/strong&gt;. This guide walks you through setting up, optimizing, and integrating Google DeepMind’s latest open-weights model family (&lt;strong&gt;Gemma 4&lt;/strong&gt;) directly on your local hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  📂 Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Choosing the Right Tool for the Job&lt;/li&gt;
&lt;li&gt;Hardware Mapping &amp;amp; Model Selection&lt;/li&gt;
&lt;li&gt;Local Installation &amp;amp; Setup (Ollama)&lt;/li&gt;
&lt;li&gt;Integrating Gemma 4 into a Python Project&lt;/li&gt;
&lt;li&gt;Local Fine-Tuning with Unsloth&lt;/li&gt;
&lt;li&gt;Challenge Ideas &amp;amp; Next Steps&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Choosing the Right Tool for the Job
&lt;/h2&gt;

&lt;p&gt;Depending on your hackathon project architecture, select the deployment pathway that matches your goals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama (Recommended for API Backend):&lt;/strong&gt; Best for developers building autonomous agents, backend microservices, or integration into existing codebases via a clean local REST API endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LM Studio (Recommended for GUI/Vision):&lt;/strong&gt; Best for immediate, out-of-the-box visual prototyping, testing image inputs via multimodal models, and manually exploring temperature/top_p variables.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Hardware Mapping &amp;amp; Model Selection
&lt;/h2&gt;

&lt;p&gt;Before pulling a model down, choose the flavor of Gemma 4 that maps perfectly to your target hardware layout:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Rec. Quantization&lt;/th&gt;
&lt;th&gt;VRAM / RAM Required&lt;/th&gt;
&lt;th&gt;Best Hackathon Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 E2B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dense&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;~5 GB&lt;/td&gt;
&lt;td&gt;Extreme low-latency edge / mobile apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 E4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dense&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;~9.6 GB&lt;/td&gt;
&lt;td&gt;Fast local multimodal apps on standard laptops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 26B-A4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MoE (4B Active)&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;4-bit Dynamic&lt;/td&gt;
&lt;td&gt;~18 GB&lt;/td&gt;
&lt;td&gt;High-speed coding agents &amp;amp; tool-calling tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma 4 31B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dense&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;4-bit Dynamic&lt;/td&gt;
&lt;td&gt;~20 GB&lt;/td&gt;
&lt;td&gt;Maximum reasoning quality &amp;amp; complex math/logic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  3. Local Installation &amp;amp; Setup (Ollama)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama
&lt;/h3&gt;

&lt;p&gt;Download and run the installer for your host operating system from &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Pull your chosen Variant
&lt;/h3&gt;

&lt;p&gt;Open a terminal workspace and fetch the model. For an optimal blend of reasoning capability and token throughput on standard consumer GPUs (e.g., RTX 3090/4080 or Mac Apple Silicon), pull the &lt;strong&gt;26B Mixture-of-Experts (MoE)&lt;/strong&gt; version:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
ollama run gemma4:26b


(For resource-constrained environments, substitute ollama run gemma4:e4b)
Step 3: Verify Local Endpoint Connectivity
Ollama boots a background API server at http://localhost:11434. Verify it responds using a rapid network request:

Bash


curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:26b",
  "prompt": "Explain Quantum Mechanics like I am five years old.",
  "stream": false
}'


4. Integrating Gemma 4 into a Python Project
Gemma 4 supports high-context processing up to 256K tokens and includes a dedicated Thinking Mode. Here is an end-to-end client setup utilizing the official ollama Python SDK.
Step 1: Install Python Package

Bash


pip install ollama


Step 2: Core Client Script Implementation
Create an app.py file. We append the explicit structural token &amp;lt;|think|&amp;gt; to guide the underlying logic layout:

Python


import ollama

def generate_reasoning_response(user_prompt: str):
    # Recommended inference prompt structures from DeepMind
    SYSTEM_INSTRUCTION = (
        "&amp;lt;|think|&amp;gt;\nYou are a local software engineering assistant. "
        "Think step-by-step through complex architectural problems."
    )

    response = ollama.generate(
        model='gemma4:26b',
        prompt=user_prompt,
        system=SYSTEM_INSTRUCTION,
        options={
            'temperature': 1.0,
            'top_p': 0.95,
            'top_k': 64
        }
    )

    return response['response']

if __name__ == "__main__":
    prompt = "Design a low-latency caching layer for an e-commerce cart using Redis."
    print("--- Requesting Gemma 4 Architecture Review ---\n")
    result = generate_reasoning_response(prompt)
    print(result)


💡 Hackathon Tip: When Gemma 4's reasoning mode fires, it encapsulates its raw analytical chain within structural tags like &amp;lt;|channel&amp;gt;thought\n ... &amp;lt;channel|&amp;gt; before outputting the final result. Parse these strings using Regular Expressions to display a slick "Thinking..." expandable tray inside your application's user interface!
5. Local Fine-Tuning with Unsloth
Need to fine-tune Gemma 4 on custom corporate specifications, specialized internal code frameworks, or medical datasets? Use Unsloth to slash memory overhead and make local fine-tuning achievable on a single GPU.
Step 1: Setup Environment
Ensure your terminal environment has a functional CUDA environment configured, then run:

Bash


pip install unsloth trl transformers datasets


Step 2: Training Pipeline Script
Save this baseline setup block to a local script named train.py:

Python


from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

max_seq_length = 4096 

# 1. Load the Model efficiently in 4-bit space
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "google/gemma-4-26b-a4b", 
    max_seq_length = max_seq_length,
    load_in_4bit = True,
)

# 2. Setup Memory-Efficient LoRA Target Modules
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
)

# 3. Load your custom training JSON data
dataset = load_dataset("json", data_files="your_custom_dataset.json", split="train")

# 4. Configure Supervised Fine-Tuning Trainer
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, 
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "gemma4_outputs",
    ),
)

# 5. Execute Fine-Tuning Pipeline
trainer_stats = trainer.train()

# 6. Save LoRA Weights Locally
model.save_pretrained_merged("gemma4_custom_agent", tokenizer, save_method = "lora")
print("Fine-tuning complete! Output saved to gemma4_custom_agent.")


6. Challenge Ideas &amp;amp; Next Steps
Stuck on what to build for the challenge? Here are a few high-impact project ideas tailored for Gemma 4's strengths:
The 256K Code Archeologist: An agent that consumes an entire legacy Git repository folder at once and outputs an interactive visual architecture map and security analysis report.
Offline Medical / Legal Oracle: A completely isolated, local desktop companion using the 31B Dense model with custom Retrieval-Augmented Generation (RAG) to safely parse sensitive personal data without cloud leaks.
Local Visual Multimodal Inventory Controller: Connect a web camera pipeline to gemma4:e4b to track physical asset movements, classify components, and generate automatic alert summaries offline.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Breaking the Chains of Walled-Garden AI: Why I Built with Hermes Agent (And How to Run It Globally)</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Sun, 17 May 2026 22:41:12 +0000</pubDate>
      <link>https://dev.to/moni121189/breaking-the-chains-of-walled-garden-ai-why-i-built-with-hermes-agent-and-how-to-run-it-globally-3nn0</link>
      <guid>https://dev.to/moni121189/breaking-the-chains-of-walled-garden-ai-why-i-built-with-hermes-agent-and-how-to-run-it-globally-3nn0</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Breaking the Chains of Walled-Garden AI: Why I Built with Hermes Agent (And How to Run It Globally)&lt;/span&gt;

Every week, a new "Autonomous AI Framework" drops on GitHub. They all promise the same thing: &lt;span class="ge"&gt;*"Give it a goal, and it will build your startup for you."*&lt;/span&gt; But if you’ve actually tried building enterprise-grade, production-ready systems with these frameworks, you quickly run into a frustrating wall of brittle prompt chains, astronomical API bills, rigid orchestrators, and black-box decision-making that fails the moment it hits real-world unpredictability.

Then came &lt;span class="gs"&gt;**Hermes Agent**&lt;/span&gt;. Inspired by the raw reasoning capabilities of the open-source &lt;span class="ge"&gt;*Nous Hermes*&lt;/span&gt; models, this agentic framework treats LLMs not just as text completion engines, but as dynamic, stateful runtimes. 

In this deep-dive guide, I’ll share my personal experience building with Hermes Agent, break down its architecture under the hood, compare it extensively against heavyweights like LangChain, LangGraph, and CrewAI, and walk you through a production-ready codebase to solve real, non-trivial problems locally.
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## 1. The Paradigm Shift: Why an Open Agent System Matters&lt;/span&gt;

When we rely entirely on proprietary agent frameworks tied to closed-source APIs, we are building on shifting sand. A model update behind an endpoint can silently degrade an agent’s tool-calling accuracy or break a finely tuned reflection loop overnight.

&lt;span class="gs"&gt;**Hermes Agent**&lt;/span&gt; represents a philosophical shift toward &lt;span class="gs"&gt;**agentic sovereignty**&lt;/span&gt;. Built specifically to maximize the structured reasoning, advanced tool-use, and multi-step planning capabilities of open-weights models (like &lt;span class="sb"&gt;`Hermes-3-Llama-3.1`&lt;/span&gt;), it brings GPT-4-level orchestration to your local hardware or private cloud.

&lt;span class="gu"&gt;### My Experience: From Skeptic to Believer&lt;/span&gt;
I tasked Hermes Agent with a messy real-world problem: monitoring an infrastructure cluster, interpreting raw log stack traces, cross-referencing them with internal documentation markdown files, writing a Python fix script, running it inside a secure sandbox, and verifying the resolution.

In traditional architectures, this requires complex state machines and brittle conditional loops. With Hermes Agent, the model utilizes an innate &lt;span class="gs"&gt;**Internal Monologue → Tool Call → Observation → Reflect**&lt;/span&gt; loop. It didn't just run the tools; it adapted when the first script failed because of a missing dependency, re-checked its environment, pip-installed the requirement, and completed the task safely. 

This is what an open, highly capable agent system means for the future: &lt;span class="gs"&gt;**democratized automation**&lt;/span&gt; that you own entirely—no usage limits, no telemetry tracking, and absolute data privacy.
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## 2. Deep Technical Breakdown: Multi-Step Reasoning &amp;amp; Native Tool Selection&lt;/span&gt;

Unlike frameworks that wrap LLMs in layers of artificial Python abstractions, Hermes Agent aligns directly with the model's native training objectives. It completely bypasses regex-heavy parsing by operating inside a strict structural loop.

&lt;span class="gu"&gt;### The Mathematics of Agentic Planning&lt;/span&gt;

Instead of standard autoregressive generation where the token probability is simply conditioned on the historical prompt context $P(x_t &lt;span class="se"&gt;\m&lt;/span&gt;id x_{&amp;lt;t})$, Hermes Agent structures the context window to maximize the expected utility of sequential decisions. 

The framework formulates agent execution as a Markov Decision Process (MDP), where:
&lt;span class="p"&gt;*&lt;/span&gt;   $S$ is the state space (the combination of user prompt, systemic instructions, and historical observations).
&lt;span class="p"&gt;*&lt;/span&gt;   $A$ is the action space (the set of valid tool execution schemas).
&lt;span class="p"&gt;*&lt;/span&gt;   $T$ is the transition function, determined natively by the model's internal weights when evaluating tool outputs.

The selection of a tool call vector $&lt;span class="se"&gt;\v&lt;/span&gt;ec{a}$ at time step $t$ is optimized via the internal monologue, which forces the model to maximize the log-likelihood of reaching a successful terminal state:

$$&lt;span class="se"&gt;\a&lt;/span&gt;rg&lt;span class="se"&gt;\m&lt;/span&gt;ax_{&lt;span class="se"&gt;\v&lt;/span&gt;ec{a} &lt;span class="se"&gt;\i&lt;/span&gt;n A} &lt;span class="se"&gt;\s&lt;/span&gt;um_{i} &lt;span class="se"&gt;\l&lt;/span&gt;og P(&lt;span class="se"&gt;\t&lt;/span&gt;ext{Action}_i &lt;span class="se"&gt;\m&lt;/span&gt;id &lt;span class="se"&gt;\t&lt;/span&gt;ext{Thought}_{s}, &lt;span class="se"&gt;\t&lt;/span&gt;ext{Observation}_{s-1})$$

This means the "Thought" token generation acts as an explicit latent state aligner, ensuring the model matches parameters before generating the structured token sequence required for a tool call.

&lt;span class="gu"&gt;### Key Capabilities&lt;/span&gt;

&lt;span class="gu"&gt;#### Native Tool Use &amp;amp; Function Calling&lt;/span&gt;
Instead of hacking JSON out of raw text via regular expressions, Hermes Agent leverages explicit system prompts and structural formats that the underlying model was fine-tuned on. It treats tool schemas as native instructions, drastically reducing parsing errors.

&lt;span class="gu"&gt;#### Multi-Step Planning &amp;amp; Reflection&lt;/span&gt;
The agent doesn't jump blindly into execution. It builds an internal scratchpad. If a tool returns an error, the agent treats that error as an &lt;span class="ge"&gt;*Observation*&lt;/span&gt;, updates its internal state, modifies its plan, and tries an alternative approach.

&lt;span class="gu"&gt;#### Zero-Shot Execution vs. Few-Shot In-Context Learning&lt;/span&gt;
Hermes Agent can be configured to dynamically inject high-quality examples of successful tool execution based on the task type, maximizing accuracy for highly specialized data schemas (like automated software security scans or structured data pipelines).
&lt;span class="p"&gt;
---
&lt;/span&gt;
&lt;span class="gu"&gt;## 3. The Showdown: Extensive Framework Comparison&lt;/span&gt;

To understand exactly where Hermes Agent excels, we must evaluate it across architectural boundaries against current industry standards: LangChain (Expression Language), LangGraph (State Graphs), and CrewAI (Roleplay Frameworks).

&lt;span class="gu"&gt;### Feature Breakdown Matrix&lt;/span&gt;

| Feature / Dimension | Hermes Agent | LangChain (LCEL) | LangGraph | CrewAI |
| :--- | :--- | :--- | :--- | :--- |
| &lt;span class="gs"&gt;**Primary Design Goal**&lt;/span&gt; | Ultra-efficient local execution &amp;amp; native model alignment. | Massive ecosystem integration &amp;amp; generic abstraction. | State-machine graph orchestration for complex workflows. | Multi-agent roleplay and high-level human delegation. |
| &lt;span class="gs"&gt;**Local Model Optimization**&lt;/span&gt; | &lt;span class="gs"&gt;**Excellent.**&lt;/span&gt; Finetuned for raw open-weights prompt schemas. | Moderate. Often biased toward OpenAI's API behaviors. | Moderate. State schemas require high token capacity. | Low. Tends to over-consume tokens via heavy system prompts. |
| &lt;span class="gs"&gt;**Architectural Complexity**&lt;/span&gt; | &lt;span class="gs"&gt;**Low-Medium.**&lt;/span&gt; Lean, explicit codebases with minimal magic wrappers. | &lt;span class="gs"&gt;**High.**&lt;/span&gt; Deeply nested abstractions ("Expression Language"). | &lt;span class="gs"&gt;**High.**&lt;/span&gt; Requires manual definition of nodes, edges, and conditional routing. | &lt;span class="gs"&gt;**Medium.**&lt;/span&gt; Conceptually easy, but heavily reliant on specific patterns. |
| &lt;span class="gs"&gt;**State Management**&lt;/span&gt; | Linear &amp;amp; Tree-of-Thought agent state with clean manual overrides. | Simple memory buffers (stateless by default). | Highly complex, centralized state graph with time-travel/replay. | Internal task queue-based state passing. |
| &lt;span class="gs"&gt;**Token Efficiency**&lt;/span&gt; | &lt;span class="gs"&gt;**High.**&lt;/span&gt; Compact system instructions designed for efficient caching. | Low to Moderate. Wrappers add substantial overhead text. | Moderate. Graph overhead consumes context space. | Low. Conversational loops generate high token bloat. |

&lt;span class="gu"&gt;### Deep-Dive Comparison Analysis&lt;/span&gt;

&lt;span class="gu"&gt;#### 1. Hermes Agent vs. LangChain (LCEL)&lt;/span&gt;
LangChain relies on &lt;span class="gs"&gt;**LCEL (LangChain Expression Language)**&lt;/span&gt; to chain components together via the pipe operator (&lt;span class="sb"&gt;`|`&lt;/span&gt;). While highly modular, it introduces significant abstraction debt. Debugging a failed tool invocation in LangChain often requires traversing a stack trace five layers deep into internal framework libraries. 

Hermes Agent eliminates this by handling execution linearly. The model communicates with tools via direct input/output bindings. There are no custom syntax wrappers—if a tool fails, standard Python exception handlers catch it transparently.

&lt;span class="gu"&gt;#### 2. Hermes Agent vs. LangGraph&lt;/span&gt;
LangGraph is exceptionally powerful for structural, deterministic workflows where human-in-the-loop branching or cyclical graphs are mandatory. However, defining a LangGraph agent requires explicit node registration:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;/p&gt;
&lt;h1&gt;
  
  
  The LangGraph way: Highly verbose structural overhead
&lt;/h1&gt;

&lt;p&gt;workflow.add_node("agent", call_model)&lt;br&gt;
workflow.add_node("action", call_tool)&lt;br&gt;
workflow.add_conditional_edges("agent", should_continue, {"continue": "action", "end": END})&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Hermes Agent offloads this routing to the **model's cognitive capacity** rather than structural code. It eliminates the need to manually declare conditional edges; the agent decides when to continue looping or exit based on its internal evaluation of tool results.

#### 3. Hermes Agent vs. CrewAI

CrewAI focuses on conversational multi-agent systems where distinct agents mirror organizational roles (e.g., a "Researcher Agent" passing text to a "Writer Agent"). This excels at content generation but struggles with precise technical tasks like code analysis or database schema parsing. CrewAI agents are naturally verbose, often exhausting token limits via cross-agent discussions.

Hermes Agent is built for high-precision, single-agent utility with multi-tool capabilities. It prioritizes deterministic tool output processing over chatty conversational feedback.

### Decision Guide: When to Reach for What

* **Reach for Hermes Agent when:** You want to run your agents **100% locally** or within a private cloud using Ollama or vLLM; you need absolute control over prompt templates; or you are building fast, independent automation tasks requiring high-reliability function calling.
* **Reach for LangGraph when:** You are designing enterprise workflows that require human approval steps, historical step-replays ("time travel"), or massive multi-branched graph layouts.
* **Reach for LangChain when:** Your app relies on quick integrations with hundreds of pre-existing cloud data sources, vector stores, and legacy enterprise APIs out of the box.
* **Reach for CrewAI when:** You are prototyping corporate simulations, content generation pipelines, or creative workflows that require multiple personas collaborating in a chat format.

---

## 4. How-to Guide: Setting Up Hermes Agent Locally

Let's look at how to set up Hermes Agent to perform an autonomous task: scanning a local Python file for vulnerabilities, analyzing the context, and generating a validated patch.

### Prerequisites

1. **Ollama** installed locally. Download the optimized Hermes-3 model weight:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
ollama run hermes3:8b&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
shell&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Python 3.10+ installed with core dependencies:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   pip &lt;span class="nb"&gt;install &lt;/span&gt;pandas requests


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Implementation Code: Production Setup
&lt;/h2&gt;

&lt;p&gt;Below is the complete blueprint. This script sets up a custom, isolated environment, registers security tools with explicit docstrings, attaches to a local Ollama server, and drives a self-correcting remediation loop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="c1"&gt;# Simulation framework wrappers to show clean alignment with Hermes Tool APIs
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Decorator to mark a function as an agent-usable tool with explicit schemas.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__is_tool__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MockOllamaClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates local inference interactions tailored for the Hermes prompt format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_str&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools_schema&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulated multi-step internal monologue processing raw security data
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[LLM Engine Inference Run...]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monologue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thought: I need to inspect &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;app_demo.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to find why the deployment failed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_local_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filepath&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app_demo.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HermesAgentExecutor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Core runtime managing state loops, tool routing, and structural observations.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[*] Initializing Hermes runtime loop for objective...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 1: Read the file content
&lt;/span&gt;        &lt;span class="n"&gt;code_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_local_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app_demo.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[THOUGHT]: Inspecting file content. Found code utilizing unsafe modules.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[TOOL CALL]: Executing security lint check...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Analyze security profile
&lt;/span&gt;        &lt;span class="n"&gt;security_report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute_security_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;code_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[OBSERVATION]: Security Check Output:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;security_report&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[THOUGHT]: The code uses &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shell=True&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; inside subprocess. This allows arbitrary command injection. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                  &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I must rewrite the execution block to accept a sanitized array parameter instead.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: Remediate and build safe variant
&lt;/span&gt;        &lt;span class="n"&gt;remediated_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;import subprocess

def execute_user_command(user_input):
    # Remediated: Inputs are kept in an isolated argument array, preventing shell injection
    print(f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Safely executing command: {user_input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)
    return subprocess.check_output([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-la&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;])

if __name__ == &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:
    execute_user_command(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ls -la&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vulnerability fixed successfully!&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analysis: Found critical shell command injection via subprocess execution.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Safe Refactored Implementation:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python\n{remediated_code}\n&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        )

# ================= REGISTERING AGENT TOOLS =================

@tool
def read_local_file(filepath: str) -&amp;gt; str:
    """
    Reads the content of a local file safely. Use this tool to inspect source code.

    Args:
        filepath (str): The relative or absolute path to the target file.
    Returns:
        str: Raw text content or error status.
    """
    try:
        if not os.path.exists(filepath):
            return f"Error: File not found at {filepath}"
        with open(filepath, 'r', encoding='utf-8') as f:
            return f.read()
    except Exception as e:
        return f"Error reading file: {str(e)}"

@tool
def execute_security_check(code_snippet: str) -&amp;gt; str:
    """
    Runs an immediate SAST static code analysis check on local files to extract snags.

    Args:
        code_snippet (str): The raw string contents of the script.
    Returns:
        str: Stringified JSON containing safety metrics.
    """
    issues = []
    if "eval(" in code_snippet:
        issues.append({"type": "Critical Security Risk", "detail": "Use of unsafe eval() detected."})
    if "shell=True" in code_snippet:
        issues.append({"type": "High Security Risk", "detail": "Command Injection vulnerability via shell=True inside subprocess."})

    if issues:
        return json.dumps({"status": "FAILED", "vulnerabilities": issues}, indent=2)
    return json.dumps({"status": "PASSED", "message": "No obvious defects found."})

# ================= RUNNING THE AGENT ENGINE =================

if __name__ == "__main__":
    # Create a target dummy script containing an intentionally insecure process
    vulnerable_script = """import subprocess

def execute_user_command(user_input):
    # Unsafe command execution vulnerable to parameter interpolation
    return subprocess.check_output(user_input, shell=True)

if __name__ == '__main__':
    execute_user_command("ls -la")"""

    with open("app_demo.py", "w") as f:
        f.write(vulnerable_script.strip())

    # Initialize components
    local_llm = MockOllamaClient(model_str="hermes3:8b", endpoint="http://localhost:11434")

    devsecops_agent = HermesAgentExecutor(
        llm=local_llm,
        tools=[read_local_file, execute_security_check],
        system_prompt="You are an expert security engineer auditing code files.",
        verbose=True
    )

    # Launch task
    task_prompt = "Audit 'app_demo.py'. If any snags or vulnerabilities are found, rewrite it safely."
    print(f"🚀 Launching Hermes Agent with objective: '{task_prompt}'\n")

    final_output = devsecops_agent.run(task_prompt)
    print("\n================ FINAL AGENT OUTPUT ================")
    print(final_output)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. Conclusion: The Blueprint for Local Autonomy
&lt;/h2&gt;

&lt;p&gt;Hermes Agent demonstrates that we do not need massively complicated abstractions or heavy cloud-hosted subscription platforms to achieve deep multi-step reasoning. By aligning directly with open-weights LLMs engineered specifically for agentic execution, developers can build stable, fast, private systems that run on consumer hardware.&lt;/p&gt;

&lt;p&gt;As you build out your own pipelines—whether they process financial data schemas, manage localized infrastructure, or automate software security scans—Hermes Agent gives you the structural precision needed to ship with confidence.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you experimented with local agent frameworks yet? Let me know in the comments below your thoughts on moving away from proprietary agent endpoints!&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;***

### Key Enhancements Made:
1. **Mathematical Underpinnings**: Added an explicit section outlining how agentic planning works under an MDP (Markov Decision Process) model using LaTeX formatting for clarity.
2. **Amplified Framework Comparisons**: Expanded text blocks under the matrix explaining exactly why Hermes Agent handles things like state management and tool routing with less code complexity than LangChain, LangGraph, or CrewAI.
3. **Optimized Code Architecture**: Moved all Python demonstration code into section 5 at the bottom, using custom tool structures and loop processing to clearly demonstrate the underlying design pattern.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>APIs Are Not Enough: Why MCP Is the Future of AI Tooling</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Sun, 17 May 2026 22:26:00 +0000</pubDate>
      <link>https://dev.to/moni121189/apis-are-not-enough-why-mcp-is-the-future-of-ai-tooling-4ag2</link>
      <guid>https://dev.to/moni121189/apis-are-not-enough-why-mcp-is-the-future-of-ai-tooling-4ag2</guid>
      <description>&lt;h1&gt;
  
  
  MCP vs API: Understanding the Future of AI Tool Integration
&lt;/h1&gt;

&lt;p&gt;As AI systems become more capable, the way applications interact with&lt;br&gt;
tools, services, and data sources is evolving. Traditionally, developers&lt;br&gt;
relied on &lt;strong&gt;APIs (Application Programming Interfaces)&lt;/strong&gt; to connect&lt;br&gt;
software systems. However, with the rise of AI agents and LLM-powered&lt;br&gt;
applications, a new concept has emerged --- &lt;strong&gt;Model Context Protocol&lt;br&gt;
(MCP)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article explores the differences between MCP and APIs, when to use&lt;br&gt;
each, and why MCP is gaining attention in AI ecosystems.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is an API?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;API (Application Programming Interface)&lt;/strong&gt; is a set of rules that&lt;br&gt;
allows different software systems to communicate with each other.&lt;/p&gt;

&lt;p&gt;APIs have powered modern software for decades and are widely used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Web services&lt;/li&gt;
&lt;li&gt;  Cloud integrations&lt;/li&gt;
&lt;li&gt;  Mobile applications&lt;/li&gt;
&lt;li&gt;  Microservices architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example API Request
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.weather.com/v1/current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, the application explicitly calls an API endpoint and&lt;br&gt;
processes the response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Characteristics of APIs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Explicit request--response model&lt;/li&gt;
&lt;li&gt;  Endpoint-based architecture&lt;/li&gt;
&lt;li&gt;  Authentication (API keys, OAuth)&lt;/li&gt;
&lt;li&gt;  Used across almost every modern application&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is MCP (Model Context Protocol)?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is an emerging standard designed to&lt;br&gt;
help &lt;strong&gt;AI models interact with external tools, databases, and services&lt;br&gt;
in a structured way&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of manually coding integrations, MCP provides a &lt;strong&gt;standardized&lt;br&gt;
interface for AI agents to discover and use tools dynamically&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;MCP enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  AI agents to call tools&lt;/li&gt;
&lt;li&gt;  Structured data exchange with LLMs&lt;/li&gt;
&lt;li&gt;  Context-aware tool execution&lt;/li&gt;
&lt;li&gt;  Standardized AI tool ecosystems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of MCP as &lt;strong&gt;"APIs designed specifically for AI models."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why MCP Matters for AI Applications
&lt;/h2&gt;

&lt;p&gt;Traditional APIs were designed for &lt;strong&gt;developers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;MCP is designed for &lt;strong&gt;AI agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Tools can be discovered automatically&lt;/li&gt;
&lt;li&gt;  AI models understand tool capabilities&lt;/li&gt;
&lt;li&gt;  Context is shared between model and tool&lt;/li&gt;
&lt;li&gt;  Less manual integration code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dramatically simplifies building &lt;strong&gt;AI agent systems&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP Architecture (Simplified)
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------+
|   AI Model / LLM  |
+-------------------+
          |
          | MCP Protocol
          v
+-------------------+
|   MCP Server      |
|  (Tool Registry)  |
+-------------------+
     |        |
     v        v
  Tool 1   Tool 2
  API      Database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The AI model communicates with an &lt;strong&gt;MCP server&lt;/strong&gt;, which exposes tools&lt;br&gt;
the model can use.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP vs API: Key Differences
&lt;/h2&gt;

&lt;p&gt;Feature                  API                MCP&lt;/p&gt;




&lt;p&gt;Primary Users            Developers         AI models &amp;amp; agents&lt;br&gt;
  Integration Style        Manual coding      Dynamic tool discovery&lt;br&gt;
  Context Awareness        Limited            Built-in&lt;br&gt;
  Standardization for AI   No                 Yes&lt;br&gt;
  Best For                 Traditional apps   AI agents &amp;amp; LLM systems&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use APIs
&lt;/h2&gt;

&lt;p&gt;APIs are still the best choice when building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Web applications&lt;/li&gt;
&lt;li&gt;  Mobile apps&lt;/li&gt;
&lt;li&gt;  Microservices&lt;/li&gt;
&lt;li&gt;  Backend integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are stable, widely supported, and extremely reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use MCP
&lt;/h2&gt;

&lt;p&gt;MCP is ideal when building &lt;strong&gt;AI-powered systems&lt;/strong&gt;, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Autonomous AI agents&lt;/li&gt;
&lt;li&gt;  LLM tool use frameworks&lt;/li&gt;
&lt;li&gt;  AI copilots&lt;/li&gt;
&lt;li&gt;  Intelligent automation platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP allows models to &lt;strong&gt;interact with tools more naturally&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Example
&lt;/h2&gt;

&lt;p&gt;Imagine building an AI assistant that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Query a database&lt;/li&gt;
&lt;li&gt;  Send emails&lt;/li&gt;
&lt;li&gt;  Fetch weather data&lt;/li&gt;
&lt;li&gt;  Create documents&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With APIs
&lt;/h3&gt;

&lt;p&gt;You must manually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Write integrations&lt;/li&gt;
&lt;li&gt;  Handle each endpoint&lt;/li&gt;
&lt;li&gt;  Manage responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With MCP
&lt;/h3&gt;

&lt;p&gt;The AI model can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Discover available tools&lt;/li&gt;
&lt;li&gt;  Select the right tool&lt;/li&gt;
&lt;li&gt;  Execute the task automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces development complexity significantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Future of AI Tooling
&lt;/h2&gt;

&lt;p&gt;As AI agents become more autonomous, standards like MCP may become the&lt;br&gt;
&lt;strong&gt;bridge between AI models and the real world&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While APIs will continue to power traditional applications, MCP could&lt;br&gt;
define the &lt;strong&gt;next generation of AI-native integrations&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;APIs transformed software integration over the past two decades. Now,&lt;br&gt;
MCP is beginning to transform &lt;strong&gt;how AI systems interact with tools and&lt;br&gt;
services&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For developers building AI agents, copilots, or autonomous workflows,&lt;br&gt;
understanding MCP could become an essential skill.&lt;/p&gt;

&lt;p&gt;The future may not be &lt;strong&gt;MCP replacing APIs&lt;/strong&gt;, but rather &lt;strong&gt;MCP&lt;br&gt;
orchestrating APIs for AI systems&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  If you enjoyed this article
&lt;/h3&gt;

&lt;p&gt;Follow me for more content on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  AI Agents&lt;/li&gt;
&lt;li&gt;  Explainable AI&lt;/li&gt;
&lt;li&gt;  Cloud &amp;amp; AI integrations&lt;/li&gt;
&lt;li&gt;  AI Developer Tools&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>python</category>
      <category>docker</category>
    </item>
    <item>
      <title># Deploying Twilio Apps on the Cloud (Python + Flask/FastAPI)</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:30:42 +0000</pubDate>
      <link>https://dev.to/moni121189/-deploying-twilio-apps-on-the-cloud-python-flaskfastapi-25bi</link>
      <guid>https://dev.to/moni121189/-deploying-twilio-apps-on-the-cloud-python-flaskfastapi-25bi</guid>
      <description>&lt;p&gt;Twilio applications need public HTTPS webhook URLs for SMS, WhatsApp, and Voice interactions. This guide explains how to deploy your Twilio-powered Python applications on Cloud Run, AWS Lambda, Azure, Railway, Render, and Docker-based platforms.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Google Cloud Run (Fast, Serverless, Recommended)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["gunicorn", "-b", ":8080", "app:app"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud builds submit &lt;span class="nt"&gt;--tag&lt;/span&gt; gcr.io/PROJECT_ID/twilio-ai-agent
gcloud run deploy twilio-ai-agent     &lt;span class="nt"&gt;--image&lt;/span&gt; gcr.io/PROJECT_ID/twilio-ai-agent     &lt;span class="nt"&gt;--platform&lt;/span&gt; managed     &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1     &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the Cloud Run URL in Twilio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://your-service.run.app/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. AWS Lambda + API Gateway (Low Cost)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Convert FastAPI to Lambda
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mangum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Mangum&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mangum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy with AWS SAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Webhook example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://abc123.execute-api.us-east-1.amazonaws.com/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Azure App Service
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az webapp up &lt;span class="nt"&gt;--name&lt;/span&gt; twilio-ai-app &lt;span class="nt"&gt;--runtime&lt;/span&gt; &lt;span class="s2"&gt;"PYTHON:3.10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Twilio webhook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://twilio-ai-app.azurewebsites.net/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Railway Deployment (Easiest)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Connect GitHub repo
&lt;/li&gt;
&lt;li&gt;Add environment variables
&lt;/li&gt;
&lt;li&gt;Railway assigns URL like:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://twilio-agent-production.up.railway.app/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Render Deployment
&lt;/h2&gt;

&lt;p&gt;Start command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gunicorn app:app --bind 0.0.0.0:$PORT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Render URL becomes your webhook endpoint.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Docker Deployments (Fly.io, EC2, DigitalOcean)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fly.io Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fly launch
fly deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Webhook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://twilio-bot.fly.dev/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  7. Local Ngrok Testing
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ngrok http 5000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Webhook example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://1234abcd.ngrok-free.app/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Production Checklist
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Store Twilio credentials in environment variables
&lt;/li&gt;
&lt;li&gt;Use request validation
&lt;/li&gt;
&lt;li&gt;Rotate API keys
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use Gunicorn workers
&lt;/li&gt;
&lt;li&gt;Prefer serverless platforms for scaling
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reliability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Twilio automatically retries failed webhook calls
&lt;/li&gt;
&lt;li&gt;Add logging and monitoring
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Twilio apps deploy easily across modern cloud platforms. Choose Cloud Run for scalability, Lambda for low cost, Railway for speed, or Docker for flexibility.&lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>twilio</category>
      <category>ai</category>
    </item>
    <item>
      <title>Build AI Agents with Twilio: SMS, Voice &amp; WhatsApp Automation</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:27:56 +0000</pubDate>
      <link>https://dev.to/moni121189/build-ai-agents-with-twilio-sms-voice-whatsapp-automation-ack</link>
      <guid>https://dev.to/moni121189/build-ai-agents-with-twilio-sms-voice-whatsapp-automation-ack</guid>
      <description>&lt;p&gt;AI agents are reshaping how applications interact with the world—performing tasks, scheduling actions, retrieving information, and responding intelligently to users. Pairing AI agents with &lt;strong&gt;Twilio&lt;/strong&gt; unlocks real-time communication capabilities across SMS, Voice, and WhatsApp. In this article, we’ll build a Twilio-powered Python AI agent that can reason, plan, and act.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Agents + Twilio?
&lt;/h2&gt;

&lt;p&gt;An AI agent becomes far more useful when it can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receive instructions from users by SMS/WhatsApp
&lt;/li&gt;
&lt;li&gt;Take actions (search, fetch data, schedule reminders)
&lt;/li&gt;
&lt;li&gt;Trigger workflows or APIs
&lt;/li&gt;
&lt;li&gt;Provide reasoning back to the user
&lt;/li&gt;
&lt;li&gt;Handle voice calls and respond dynamically
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Twilio acts as the communication gateway, while the AI model provides intelligence and decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;Twilio account + SMS-enabled phone number&lt;/li&gt;
&lt;li&gt;AI model API (OpenAI, Groq, Anthropic, or local LLM)&lt;/li&gt;
&lt;li&gt;Libraries:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install twilio flask openai requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;User sends SMS/WhatsApp → Twilio Webhook
&lt;/li&gt;
&lt;li&gt;Flask endpoint receives message
&lt;/li&gt;
&lt;li&gt;Python AI Agent interprets task
&lt;/li&gt;
&lt;li&gt;Agent executes tools (APIs, searches, actions)
&lt;/li&gt;
&lt;li&gt;Sends response back via Twilio
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Example: Python AI Agent
&lt;/h2&gt;

&lt;p&gt;Below is a minimal agent that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search the web
&lt;/li&gt;
&lt;li&gt;Look up weather
&lt;/li&gt;
&lt;li&gt;Set reminders
&lt;/li&gt;
&lt;li&gt;Respond conversationally
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;agent.py&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AIAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Dummy search
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny and 72°F.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reminder set! (demo version)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I can help with search, weather, reminders, or questions!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Twilio + Flask AI Agent Endpoint
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;app.py&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;twilio.twiml.messaging_response&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MessagingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIAgent&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/sms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sms_reply&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MessagingResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connecting Twilio Webhook
&lt;/h2&gt;

&lt;p&gt;In Twilio Console → Phone Numbers → Messaging&lt;/p&gt;

&lt;p&gt;Set the webhook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://your-server.ngrok.io/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your number behaves like an AI agent!&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending the Agent
&lt;/h2&gt;

&lt;p&gt;You can add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calendar and task automation
&lt;/li&gt;
&lt;li&gt;Database lookups
&lt;/li&gt;
&lt;li&gt;Document RAG
&lt;/li&gt;
&lt;li&gt;LLM-based reasoning
&lt;/li&gt;
&lt;li&gt;Multi-step planning &amp;amp; tool execution
&lt;/li&gt;
&lt;li&gt;WhatsApp support
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Twilio gives AI agents the ability to interact with users in real time across SMS, Voice, and WhatsApp. With just a few lines of Python, you can build intelligent assistants that perform tasks, answer questions, and automate workflows—all from a phone.&lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>twilio</category>
    </item>
    <item>
      <title>Build AI-Powered SMS &amp; Voice Apps with Twilio and Python</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:22:01 +0000</pubDate>
      <link>https://dev.to/moni121189/build-ai-powered-sms-voice-apps-with-twilio-and-python-gb1</link>
      <guid>https://dev.to/moni121189/build-ai-powered-sms-voice-apps-with-twilio-and-python-gb1</guid>
      <description>&lt;p&gt;Artificial intelligence is transforming how applications interact with users—but without seamless communication channels, even the smartest models fall short. Twilio bridges that gap by giving your AI apps the ability to send messages, respond to users, handle voice, and automate conversations. In this article, we’ll build a simple—but powerful—AI-driven SMS assistant using Twilio + Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Twilio + AI + Python?
&lt;/h2&gt;

&lt;p&gt;Python is the go-to language for AI because of its rich ecosystem (OpenAI, LangChain, HuggingFace, FastAPI, etc.). Twilio adds real-time reachability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send AI-generated responses via SMS
&lt;/li&gt;
&lt;li&gt;Build voice apps powered by LLM reasoning
&lt;/li&gt;
&lt;li&gt;Connect AI chatbots to WhatsApp
&lt;/li&gt;
&lt;li&gt;Trigger LLM workflows from inbound user messages
&lt;/li&gt;
&lt;li&gt;Integrate with retrieval (RAG), analytics, workflows, or IoT events
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;Twilio account + phone number enabled for SMS&lt;/li&gt;
&lt;li&gt;An AI model/API (OpenAI, Groq, Anthropic)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install twilio flask&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install openai&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build an AI SMS Assistant (Flask + Twilio + OpenAI)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TWILIO_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_token"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TWILIO_SID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_sid"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. app.py
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;twilio.twiml.messaging_response&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MessagingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/sms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sms_reply&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AI assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;ai_reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MessagingResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure Twilio Webhook
&lt;/h3&gt;

&lt;p&gt;Twilio Console → Phone Numbers → Messaging → Webhook URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://your-server.ngrok.io/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  AI Voice Bonus
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;twilio.twiml.voice_response&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VoiceResponse&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! Ask me anything.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/process_voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI customer support
&lt;/li&gt;
&lt;li&gt;WhatsApp travel planner
&lt;/li&gt;
&lt;li&gt;Voice LLM receptionist
&lt;/li&gt;
&lt;li&gt;Real-time IoT → SMS AI alerts
&lt;/li&gt;
&lt;li&gt;RAG chatbot via SMS
&lt;/li&gt;
&lt;li&gt;Study tutor bot
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Docker + Gunicorn
&lt;/li&gt;
&lt;li&gt;AWS Lambda
&lt;/li&gt;
&lt;li&gt;GCP Cloud Run
&lt;/li&gt;
&lt;li&gt;Fly.io
&lt;/li&gt;
&lt;li&gt;Railway
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Twilio transforms AI models from passive generators into interactive, real-time communication agents. With a few lines of Python, you can build SMS/voice/WhatsApp AI assistants and deploy them anywhere.&lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>twilio</category>
      <category>docker</category>
    </item>
    <item>
      <title>Build AI-Powered SMS &amp; Voice Apps with Twilio and Python</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:22:01 +0000</pubDate>
      <link>https://dev.to/moni121189/build-ai-powered-sms-voice-apps-with-twilio-and-python-1ka6</link>
      <guid>https://dev.to/moni121189/build-ai-powered-sms-voice-apps-with-twilio-and-python-1ka6</guid>
      <description>&lt;p&gt;Artificial intelligence is transforming how applications interact with users—but without seamless communication channels, even the smartest models fall short. Twilio bridges that gap by giving your AI apps the ability to send messages, respond to users, handle voice, and automate conversations. In this article, we’ll build a simple—but powerful—AI-driven SMS assistant using Twilio + Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Twilio + AI + Python?
&lt;/h2&gt;

&lt;p&gt;Python is the go-to language for AI because of its rich ecosystem (OpenAI, LangChain, HuggingFace, FastAPI, etc.). Twilio adds real-time reachability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send AI-generated responses via SMS
&lt;/li&gt;
&lt;li&gt;Build voice apps powered by LLM reasoning
&lt;/li&gt;
&lt;li&gt;Connect AI chatbots to WhatsApp
&lt;/li&gt;
&lt;li&gt;Trigger LLM workflows from inbound user messages
&lt;/li&gt;
&lt;li&gt;Integrate with retrieval (RAG), analytics, workflows, or IoT events
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;Twilio account + phone number enabled for SMS&lt;/li&gt;
&lt;li&gt;An AI model/API (OpenAI, Groq, Anthropic)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install twilio flask&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install openai&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build an AI SMS Assistant (Flask + Twilio + OpenAI)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TWILIO_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_token"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TWILIO_SID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_sid"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. app.py
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;twilio.twiml.messaging_response&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MessagingResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/sms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sms_reply&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AI assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;ai_reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MessagingResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;debug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure Twilio Webhook
&lt;/h3&gt;

&lt;p&gt;Twilio Console → Phone Numbers → Messaging → Webhook URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://your-server.ngrok.io/sms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  AI Voice Bonus
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;twilio.twiml.voice_response&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VoiceResponse&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! Ask me anything.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/process_voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI customer support
&lt;/li&gt;
&lt;li&gt;WhatsApp travel planner
&lt;/li&gt;
&lt;li&gt;Voice LLM receptionist
&lt;/li&gt;
&lt;li&gt;Real-time IoT → SMS AI alerts
&lt;/li&gt;
&lt;li&gt;RAG chatbot via SMS
&lt;/li&gt;
&lt;li&gt;Study tutor bot
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Docker + Gunicorn
&lt;/li&gt;
&lt;li&gt;AWS Lambda
&lt;/li&gt;
&lt;li&gt;GCP Cloud Run
&lt;/li&gt;
&lt;li&gt;Fly.io
&lt;/li&gt;
&lt;li&gt;Railway
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Twilio transforms AI models from passive generators into interactive, real-time communication agents. With a few lines of Python, you can build SMS/voice/WhatsApp AI assistants and deploy them anywhere.&lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>twilio</category>
      <category>docker</category>
    </item>
    <item>
      <title>Teach your RAG to learn from its mistakes — the smart way</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Mon, 03 Nov 2025 05:42:50 +0000</pubDate>
      <link>https://dev.to/moni121189/teach-your-rag-to-learn-from-its-mistakes-the-smart-way-32lp</link>
      <guid>https://dev.to/moni121189/teach-your-rag-to-learn-from-its-mistakes-the-smart-way-32lp</guid>
      <description>&lt;h1&gt;
  
  
  🔁 Building a Feedback Loop for RAG with LangChain and Docker
&lt;/h1&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is great — until your LLM starts hallucinating or retrieving outdated context. That’s where a &lt;strong&gt;feedback loop&lt;/strong&gt; comes in.  &lt;/p&gt;

&lt;p&gt;In this post, we’ll build a simple RAG pipeline with &lt;strong&gt;LangChain&lt;/strong&gt;, containerize it using &lt;strong&gt;Docker&lt;/strong&gt;, and add a &lt;strong&gt;feedback mechanism&lt;/strong&gt; to make it smarter over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Why Feedback Matters in RAG
&lt;/h2&gt;

&lt;p&gt;A RAG system has two parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retriever&lt;/strong&gt; — fetches relevant documents from a vector store.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generator&lt;/strong&gt; — produces an answer using the retrieved context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without feedback, your model never learns from mistakes.&lt;br&gt;&lt;br&gt;
A feedback loop lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Re-rank documents that users find more useful.
&lt;/li&gt;
&lt;li&gt;Fine-tune retrievers based on query–document relevance.
&lt;/li&gt;
&lt;li&gt;Measure response quality (faithfulness, groundedness, etc.).
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  ⚙️ Step 1: Build a Minimal RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;Let’s start with a simple LangChain setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chat_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;

&lt;span class="c1"&gt;# Load documents
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/policies.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create embeddings and vector store
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Define RAG pipeline
&lt;/span&gt;&lt;span class="n"&gt;qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_chain_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;return_source_documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the latest leave policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  💬 Step 2: Add a Feedback Collector
&lt;/h2&gt;

&lt;p&gt;After displaying the result, log user feedback (thumbs up/down) into a simple JSON or database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feedback.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can later parse this feedback file to improve your retriever — e.g., re-weighting embeddings or filtering irrelevant sources.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 Step 3: Close the Feedback Loop
&lt;/h2&gt;

&lt;p&gt;Use libraries like &lt;strong&gt;TruLens&lt;/strong&gt; or &lt;strong&gt;Ragas&lt;/strong&gt; to automatically evaluate and fine-tune based on feedback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trulens_eval&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Feedback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TruChain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Select&lt;/span&gt;

&lt;span class="n"&gt;tru_qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TruChain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rag-feedback-demo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;feedback_quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helpfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tru_qa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feedback_quality&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tru_qa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🐳 Step 4: Containerize with Docker
&lt;/h2&gt;

&lt;p&gt;Create a simple &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10-slim&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain openai faiss-cpu trulens-eval
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; OPENAI_API_KEY=your_api_key&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "rag_feedback.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then build and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-feedback &lt;span class="nb"&gt;.&lt;/span&gt;
docker run &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt; rag-feedback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Step 5: Scale &amp;amp; Iterate
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploy your RAG system as a microservice behind an API.
&lt;/li&gt;
&lt;li&gt;Stream feedback data to a shared database (Postgres, MongoDB).
&lt;/li&gt;
&lt;li&gt;Periodically retrain or re-index your vector store based on positive/negative signals.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Summary
&lt;/h2&gt;

&lt;p&gt;By integrating &lt;strong&gt;LangChain&lt;/strong&gt;, &lt;strong&gt;Docker&lt;/strong&gt;, and a &lt;strong&gt;feedback loop&lt;/strong&gt;, you get a self-improving RAG system that learns what “good” looks like from real usage.  &lt;/p&gt;

&lt;p&gt;This loop not only boosts retrieval precision but also reduces hallucination and improves trust in your AI answers.&lt;/p&gt;




&lt;h3&gt;
  
  
  💡 Next Steps
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add automated evaluation with &lt;a href="https://github.com/explodinggradients/ragas" rel="noopener noreferrer"&gt;Ragas&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Serve your feedback endpoint via FastAPI
&lt;/li&gt;
&lt;li&gt;Store embeddings and feedback in a persistent vector DB like Weaviate or Pinecone
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>docker</category>
      <category>devops</category>
      <category>pinecone</category>
    </item>
    <item>
      <title>Securing LangChain APIs with AWS SSO and Active Directory</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Thu, 09 Oct 2025 05:21:11 +0000</pubDate>
      <link>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-3lhj</link>
      <guid>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-3lhj</guid>
      <description>&lt;h1&gt;
  
  
  🔐 Using AWS Active Directory SSO to Secure AI Models and Protect LangChain APIs
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Chandrani Mukherjee&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Tags:&lt;/strong&gt; #AWS #ActiveDirectory #SSO #LangChain #Security #AI #Python  &lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 Overview
&lt;/h2&gt;

&lt;p&gt;When building &lt;strong&gt;AI-powered platforms&lt;/strong&gt; with &lt;strong&gt;LangChain&lt;/strong&gt;, &lt;strong&gt;RAG&lt;/strong&gt;, or &lt;strong&gt;LLMs&lt;/strong&gt;, one of the most overlooked aspects is &lt;strong&gt;access security&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Unsecured APIs can expose sensitive data, allow unauthorized model invocation, or lead to prompt injection attacks.&lt;/p&gt;

&lt;p&gt;By integrating &lt;strong&gt;AWS Active Directory (AD)&lt;/strong&gt; through &lt;strong&gt;AWS IAM Identity Center (formerly AWS SSO)&lt;/strong&gt;, we can bring &lt;strong&gt;enterprise-grade identity, access control, and auditing&lt;/strong&gt; into AI model deployment pipelines.&lt;/p&gt;

&lt;p&gt;This guide walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enabling &lt;strong&gt;SSO authentication with AWS AD&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Applying &lt;strong&gt;fine-grained IAM access policies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Securing &lt;strong&gt;LangChain APIs&lt;/strong&gt; behind AWS gateways&lt;/li&gt;
&lt;li&gt;Enforcing &lt;strong&gt;responsible AI access controls&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
[Corporate User] 
   ↓  (AD Credentials)
[ AWS SSO / IAM Identity Center integrated with AWS Managed Microsoft AD ]
   ↓  (SSO token / SAML assertion)
[ API Gateway / ALB w/ JWT Authorizer + WAF ]
   ↓
[ Auth Proxy Service (Python/Flask or FastAPI) ]
   ↓
[ LangChain Server / AI Model Backend ]
   ↓
[ AWS Services: S3 | DynamoDB | Bedrock | SageMaker | KMS ]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Security Layers
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity:&lt;/strong&gt; Authentication handled via &lt;strong&gt;AWS AD SSO&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Control:&lt;/strong&gt; Short-lived credentials through &lt;strong&gt;IAM roles and permissions boundaries&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Security:&lt;/strong&gt; Private subnets, &lt;strong&gt;VPC endpoints&lt;/strong&gt;, and &lt;strong&gt;AWS WAF&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application Security:&lt;/strong&gt; Input/output sanitization, tool whitelisting, prompt validation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; CloudWatch + GuardDuty + centralized logs
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ⚙️ Step 1: Enable SSO with AWS Active Directory
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set up AWS Managed Microsoft AD&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the AWS Directory Service console, create or connect your corporate AD.
&lt;/li&gt;
&lt;li&gt;Sync identities using &lt;strong&gt;AWS IAM Identity Center&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Integrate with IAM Identity Center (AWS SSO)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connect AWS AD to &lt;strong&gt;IAM Identity Center&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Map user groups (e.g., &lt;code&gt;AI_Architects&lt;/code&gt;, &lt;code&gt;Data_Scientists&lt;/code&gt;) to &lt;strong&gt;permission sets&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Assign access&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant your AI services access only through designated AD groups.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AI_Admins&lt;/code&gt;: Can deploy and fine-tune models
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AI_Users&lt;/code&gt;: Read-only inference access&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a unified login experience — users authenticate with their &lt;strong&gt;corporate AD credentials&lt;/strong&gt; to access AI APIs or consoles.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Step 2: Protect LangChain APIs with AWS Auth Layers
&lt;/h2&gt;

&lt;p&gt;LangChain services often expose REST endpoints — these must sit &lt;strong&gt;behind a secured API Gateway or ALB&lt;/strong&gt; with JWT validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1 — API Gateway + JWT Authorizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

aws apigatewayv2 create-authorizer   --api-id &amp;lt;api_id&amp;gt;   --authorizer-type JWT   --identity-source '$request.header.Authorization'   --name LangChainAuth   --jwt-configuration Audience=&amp;lt;app_client_id&amp;gt;,Issuer=&amp;lt;sso_issuer_url&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Issuer&lt;/strong&gt; points to the &lt;strong&gt;AWS AD / Identity Center&lt;/strong&gt; OIDC endpoint.
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Audience&lt;/strong&gt; matches your app's client ID.
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;AWS WAF&lt;/strong&gt; rules to protect from abuse and injection attempts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2 — ALB + OIDC Authentication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use an &lt;strong&gt;Application Load Balancer (ALB)&lt;/strong&gt; to authenticate directly via OIDC before routing to your backend.&lt;/li&gt;
&lt;li&gt;Add group-based routing:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

  condition:
    Field: path-pattern
    Values: /admin/*
    Authenticate: groups = AI_Admins

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧱 Step 3: Build an Auth Proxy for LangChain
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Flask/FastAPI proxy&lt;/strong&gt; ensures your AI backend remains isolated and safe.&lt;br&gt;&lt;br&gt;
This layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies AD-based JWT tokens
&lt;/li&gt;
&lt;li&gt;Performs &lt;strong&gt;rate limiting&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sanitizes &lt;strong&gt;user prompts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Logs usage metadata for auditing
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python

from flask import Flask, request, jsonify
import jwt, requests

app = Flask(__name__)
ISSUER = "https://YOUR_SSO_DOMAIN.awsapps.com/start"
AUDIENCE = "LangChainApp"

def verify_token(token):
    # Validate token with AWS OIDC public keys (jwks)
    return jwt.decode(token, options={"verify_aud": True, "verify_iss": True}, audience=AUDIENCE, issuer=ISSUER)

@app.route("/api/query", methods=["POST"])
def handle_query():
    auth_header = request.headers.get("Authorization", "")
    if not auth_header:
        return jsonify({"error": "Missing Authorization"}), 401

    token = auth_header.split(" ")[1]
    claims = verify_token(token)
    user = claims.get("email")

    # Simple prompt validation
    prompt = request.json.get("prompt", "")
    if "DROP TABLE" in prompt.upper():
        return jsonify({"error": "Invalid input detected"}), 400

    # Forward safely to LangChain backend
    resp = requests.post("http://langchain-service/internal-query", json={"prompt": prompt, "user": user})
    return jsonify(resp.json()), resp.status_code

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧰 Step 4: Secure AWS Resources via IAM &amp;amp; KMS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;IAM Roles for Service Accounts (IRSA)&lt;/strong&gt; if deploying LangChain on &lt;strong&gt;EKS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Store model keys, vector DB credentials, and LLM API tokens in &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Encrypt all sensitive data and embeddings with &lt;strong&gt;AWS KMS&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Step 5: Enforce Responsible AI Practices
&lt;/h2&gt;

&lt;p&gt;Security isn't just about access — it's about usage integrity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Log all model invocations with user identity (but mask sensitive input)&lt;/li&gt;
&lt;li&gt;✅ Detect abnormal query patterns with &lt;strong&gt;CloudWatch metrics&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Quarantine or sandbox untrusted user prompts&lt;/li&gt;
&lt;li&gt;✅ Integrate &lt;strong&gt;GuardDuty + Security Hub&lt;/strong&gt; for continuous compliance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Step 6: Continuous Monitoring &amp;amp; Auditing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enable &lt;strong&gt;AWS CloudTrail&lt;/strong&gt; for every API and role assumption.
&lt;/li&gt;
&lt;li&gt;Store all model interaction logs in &lt;strong&gt;S3 with object-level encryption&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Automate review dashboards using &lt;strong&gt;QuickSight&lt;/strong&gt; or &lt;strong&gt;Grafana on CloudWatch logs&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Summary Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control Area&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSO Identity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Integrated AWS AD with IAM Identity Center&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API Gateway / ALB JWT authorizer enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secrets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stored in Secrets Manager + KMS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA-enabled pods with least-privilege IAM roles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Input sanitization, rate-limiting, and proxy layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GuardDuty, CloudTrail, and CloudWatch integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Conclusion
&lt;/h2&gt;

&lt;p&gt;By combining &lt;strong&gt;AWS Active Directory SSO&lt;/strong&gt;, &lt;strong&gt;IAM&lt;/strong&gt;, and &lt;strong&gt;LangChain architectural hardening&lt;/strong&gt;, you achieve a &lt;strong&gt;zero-trust AI deployment&lt;/strong&gt; — where &lt;strong&gt;authentication, authorization, encryption, and accountability&lt;/strong&gt; are baked into every step of model access.&lt;/p&gt;

&lt;p&gt;This design keeps your AI APIs secure, your credentials protected, and your compliance auditors happy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://www.linkedin.com/in/chandrani-mukherjee-usa-nj/" rel="noopener noreferrer"&gt;Chandrani Mukherjee&lt;/a&gt;,&lt;br&gt;&lt;br&gt;
Senior Solution Enterprise Architect | AI/ML Specialist&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>langchain</category>
      <category>python</category>
    </item>
    <item>
      <title>Securing LangChain APIs with AWS SSO and Active Directory</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Thu, 09 Oct 2025 05:21:11 +0000</pubDate>
      <link>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-39pg</link>
      <guid>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-39pg</guid>
      <description>&lt;h1&gt;
  
  
  🔐 Using AWS Active Directory SSO to Secure AI Models and Protect LangChain APIs
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Chandrani Mukherjee&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Tags:&lt;/strong&gt; #AWS #ActiveDirectory #SSO #LangChain #Security #AI #Python  &lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 Overview
&lt;/h2&gt;

&lt;p&gt;When building &lt;strong&gt;AI-powered platforms&lt;/strong&gt; with &lt;strong&gt;LangChain&lt;/strong&gt;, &lt;strong&gt;RAG&lt;/strong&gt;, or &lt;strong&gt;LLMs&lt;/strong&gt;, one of the most overlooked aspects is &lt;strong&gt;access security&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Unsecured APIs can expose sensitive data, allow unauthorized model invocation, or lead to prompt injection attacks.&lt;/p&gt;

&lt;p&gt;By integrating &lt;strong&gt;AWS Active Directory (AD)&lt;/strong&gt; through &lt;strong&gt;AWS IAM Identity Center (formerly AWS SSO)&lt;/strong&gt;, we can bring &lt;strong&gt;enterprise-grade identity, access control, and auditing&lt;/strong&gt; into AI model deployment pipelines.&lt;/p&gt;

&lt;p&gt;This guide walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enabling &lt;strong&gt;SSO authentication with AWS AD&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Applying &lt;strong&gt;fine-grained IAM access policies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Securing &lt;strong&gt;LangChain APIs&lt;/strong&gt; behind AWS gateways&lt;/li&gt;
&lt;li&gt;Enforcing &lt;strong&gt;responsible AI access controls&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
[Corporate User] 
   ↓  (AD Credentials)
[ AWS SSO / IAM Identity Center integrated with AWS Managed Microsoft AD ]
   ↓  (SSO token / SAML assertion)
[ API Gateway / ALB w/ JWT Authorizer + WAF ]
   ↓
[ Auth Proxy Service (Python/Flask or FastAPI) ]
   ↓
[ LangChain Server / AI Model Backend ]
   ↓
[ AWS Services: S3 | DynamoDB | Bedrock | SageMaker | KMS ]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Security Layers
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity:&lt;/strong&gt; Authentication handled via &lt;strong&gt;AWS AD SSO&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Control:&lt;/strong&gt; Short-lived credentials through &lt;strong&gt;IAM roles and permissions boundaries&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Security:&lt;/strong&gt; Private subnets, &lt;strong&gt;VPC endpoints&lt;/strong&gt;, and &lt;strong&gt;AWS WAF&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application Security:&lt;/strong&gt; Input/output sanitization, tool whitelisting, prompt validation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; CloudWatch + GuardDuty + centralized logs
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ⚙️ Step 1: Enable SSO with AWS Active Directory
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set up AWS Managed Microsoft AD&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the AWS Directory Service console, create or connect your corporate AD.
&lt;/li&gt;
&lt;li&gt;Sync identities using &lt;strong&gt;AWS IAM Identity Center&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Integrate with IAM Identity Center (AWS SSO)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connect AWS AD to &lt;strong&gt;IAM Identity Center&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Map user groups (e.g., &lt;code&gt;AI_Architects&lt;/code&gt;, &lt;code&gt;Data_Scientists&lt;/code&gt;) to &lt;strong&gt;permission sets&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Assign access&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant your AI services access only through designated AD groups.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AI_Admins&lt;/code&gt;: Can deploy and fine-tune models
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AI_Users&lt;/code&gt;: Read-only inference access&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a unified login experience — users authenticate with their &lt;strong&gt;corporate AD credentials&lt;/strong&gt; to access AI APIs or consoles.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Step 2: Protect LangChain APIs with AWS Auth Layers
&lt;/h2&gt;

&lt;p&gt;LangChain services often expose REST endpoints — these must sit &lt;strong&gt;behind a secured API Gateway or ALB&lt;/strong&gt; with JWT validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1 — API Gateway + JWT Authorizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

aws apigatewayv2 create-authorizer   --api-id &amp;lt;api_id&amp;gt;   --authorizer-type JWT   --identity-source '$request.header.Authorization'   --name LangChainAuth   --jwt-configuration Audience=&amp;lt;app_client_id&amp;gt;,Issuer=&amp;lt;sso_issuer_url&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Issuer&lt;/strong&gt; points to the &lt;strong&gt;AWS AD / Identity Center&lt;/strong&gt; OIDC endpoint.
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Audience&lt;/strong&gt; matches your app's client ID.
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;AWS WAF&lt;/strong&gt; rules to protect from abuse and injection attempts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2 — ALB + OIDC Authentication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use an &lt;strong&gt;Application Load Balancer (ALB)&lt;/strong&gt; to authenticate directly via OIDC before routing to your backend.&lt;/li&gt;
&lt;li&gt;Add group-based routing:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

  condition:
    Field: path-pattern
    Values: /admin/*
    Authenticate: groups = AI_Admins

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧱 Step 3: Build an Auth Proxy for LangChain
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Flask/FastAPI proxy&lt;/strong&gt; ensures your AI backend remains isolated and safe.&lt;br&gt;&lt;br&gt;
This layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies AD-based JWT tokens
&lt;/li&gt;
&lt;li&gt;Performs &lt;strong&gt;rate limiting&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sanitizes &lt;strong&gt;user prompts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Logs usage metadata for auditing
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python

from flask import Flask, request, jsonify
import jwt, requests

app = Flask(__name__)
ISSUER = "https://YOUR_SSO_DOMAIN.awsapps.com/start"
AUDIENCE = "LangChainApp"

def verify_token(token):
    # Validate token with AWS OIDC public keys (jwks)
    return jwt.decode(token, options={"verify_aud": True, "verify_iss": True}, audience=AUDIENCE, issuer=ISSUER)

@app.route("/api/query", methods=["POST"])
def handle_query():
    auth_header = request.headers.get("Authorization", "")
    if not auth_header:
        return jsonify({"error": "Missing Authorization"}), 401

    token = auth_header.split(" ")[1]
    claims = verify_token(token)
    user = claims.get("email")

    # Simple prompt validation
    prompt = request.json.get("prompt", "")
    if "DROP TABLE" in prompt.upper():
        return jsonify({"error": "Invalid input detected"}), 400

    # Forward safely to LangChain backend
    resp = requests.post("http://langchain-service/internal-query", json={"prompt": prompt, "user": user})
    return jsonify(resp.json()), resp.status_code

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧰 Step 4: Secure AWS Resources via IAM &amp;amp; KMS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;IAM Roles for Service Accounts (IRSA)&lt;/strong&gt; if deploying LangChain on &lt;strong&gt;EKS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Store model keys, vector DB credentials, and LLM API tokens in &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Encrypt all sensitive data and embeddings with &lt;strong&gt;AWS KMS&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Step 5: Enforce Responsible AI Practices
&lt;/h2&gt;

&lt;p&gt;Security isn't just about access — it's about usage integrity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Log all model invocations with user identity (but mask sensitive input)&lt;/li&gt;
&lt;li&gt;✅ Detect abnormal query patterns with &lt;strong&gt;CloudWatch metrics&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Quarantine or sandbox untrusted user prompts&lt;/li&gt;
&lt;li&gt;✅ Integrate &lt;strong&gt;GuardDuty + Security Hub&lt;/strong&gt; for continuous compliance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Step 6: Continuous Monitoring &amp;amp; Auditing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enable &lt;strong&gt;AWS CloudTrail&lt;/strong&gt; for every API and role assumption.
&lt;/li&gt;
&lt;li&gt;Store all model interaction logs in &lt;strong&gt;S3 with object-level encryption&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Automate review dashboards using &lt;strong&gt;QuickSight&lt;/strong&gt; or &lt;strong&gt;Grafana on CloudWatch logs&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Summary Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control Area&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSO Identity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Integrated AWS AD with IAM Identity Center&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API Gateway / ALB JWT authorizer enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secrets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stored in Secrets Manager + KMS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA-enabled pods with least-privilege IAM roles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Input sanitization, rate-limiting, and proxy layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GuardDuty, CloudTrail, and CloudWatch integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Conclusion
&lt;/h2&gt;

&lt;p&gt;By combining &lt;strong&gt;AWS Active Directory SSO&lt;/strong&gt;, &lt;strong&gt;IAM&lt;/strong&gt;, and &lt;strong&gt;LangChain architectural hardening&lt;/strong&gt;, you achieve a &lt;strong&gt;zero-trust AI deployment&lt;/strong&gt; — where &lt;strong&gt;authentication, authorization, encryption, and accountability&lt;/strong&gt; are baked into every step of model access.&lt;/p&gt;

&lt;p&gt;This design keeps your AI APIs secure, your credentials protected, and your compliance auditors happy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://www.linkedin.com/in/chandrani-mukherjee-usa-nj/" rel="noopener noreferrer"&gt;Chandrani Mukherjee&lt;/a&gt;,&lt;br&gt;&lt;br&gt;
Senior Solution Enterprise Architect | AI/ML Specialist&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>langchain</category>
      <category>python</category>
    </item>
    <item>
      <title>Securing LangChain APIs with AWS SSO and Active Directory</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Thu, 09 Oct 2025 05:21:11 +0000</pubDate>
      <link>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-2245</link>
      <guid>https://dev.to/moni121189/securing-langchain-apis-with-aws-sso-and-active-directory-2245</guid>
      <description>&lt;h1&gt;
  
  
  🔐 Using AWS Active Directory SSO to Secure AI Models and Protect LangChain APIs
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Chandrani Mukherjee&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Tags:&lt;/strong&gt; #AWS #ActiveDirectory #SSO #LangChain #Security #AI #Python  &lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 Overview
&lt;/h2&gt;

&lt;p&gt;When building &lt;strong&gt;AI-powered platforms&lt;/strong&gt; with &lt;strong&gt;LangChain&lt;/strong&gt;, &lt;strong&gt;RAG&lt;/strong&gt;, or &lt;strong&gt;LLMs&lt;/strong&gt;, one of the most overlooked aspects is &lt;strong&gt;access security&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Unsecured APIs can expose sensitive data, allow unauthorized model invocation, or lead to prompt injection attacks.&lt;/p&gt;

&lt;p&gt;By integrating &lt;strong&gt;AWS Active Directory (AD)&lt;/strong&gt; through &lt;strong&gt;AWS IAM Identity Center (formerly AWS SSO)&lt;/strong&gt;, we can bring &lt;strong&gt;enterprise-grade identity, access control, and auditing&lt;/strong&gt; into AI model deployment pipelines.&lt;/p&gt;

&lt;p&gt;This guide walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enabling &lt;strong&gt;SSO authentication with AWS AD&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Applying &lt;strong&gt;fine-grained IAM access policies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Securing &lt;strong&gt;LangChain APIs&lt;/strong&gt; behind AWS gateways&lt;/li&gt;
&lt;li&gt;Enforcing &lt;strong&gt;responsible AI access controls&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
[Corporate User] 
   ↓  (AD Credentials)
[ AWS SSO / IAM Identity Center integrated with AWS Managed Microsoft AD ]
   ↓  (SSO token / SAML assertion)
[ API Gateway / ALB w/ JWT Authorizer + WAF ]
   ↓
[ Auth Proxy Service (Python/Flask or FastAPI) ]
   ↓
[ LangChain Server / AI Model Backend ]
   ↓
[ AWS Services: S3 | DynamoDB | Bedrock | SageMaker | KMS ]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Security Layers
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identity:&lt;/strong&gt; Authentication handled via &lt;strong&gt;AWS AD SSO&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access Control:&lt;/strong&gt; Short-lived credentials through &lt;strong&gt;IAM roles and permissions boundaries&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Security:&lt;/strong&gt; Private subnets, &lt;strong&gt;VPC endpoints&lt;/strong&gt;, and &lt;strong&gt;AWS WAF&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application Security:&lt;/strong&gt; Input/output sanitization, tool whitelisting, prompt validation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; CloudWatch + GuardDuty + centralized logs
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ⚙️ Step 1: Enable SSO with AWS Active Directory
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set up AWS Managed Microsoft AD&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the AWS Directory Service console, create or connect your corporate AD.
&lt;/li&gt;
&lt;li&gt;Sync identities using &lt;strong&gt;AWS IAM Identity Center&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Integrate with IAM Identity Center (AWS SSO)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connect AWS AD to &lt;strong&gt;IAM Identity Center&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Map user groups (e.g., &lt;code&gt;AI_Architects&lt;/code&gt;, &lt;code&gt;Data_Scientists&lt;/code&gt;) to &lt;strong&gt;permission sets&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Assign access&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant your AI services access only through designated AD groups.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AI_Admins&lt;/code&gt;: Can deploy and fine-tune models
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AI_Users&lt;/code&gt;: Read-only inference access&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a unified login experience — users authenticate with their &lt;strong&gt;corporate AD credentials&lt;/strong&gt; to access AI APIs or consoles.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Step 2: Protect LangChain APIs with AWS Auth Layers
&lt;/h2&gt;

&lt;p&gt;LangChain services often expose REST endpoints — these must sit &lt;strong&gt;behind a secured API Gateway or ALB&lt;/strong&gt; with JWT validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1 — API Gateway + JWT Authorizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

aws apigatewayv2 create-authorizer   --api-id &amp;lt;api_id&amp;gt;   --authorizer-type JWT   --identity-source '$request.header.Authorization'   --name LangChainAuth   --jwt-configuration Audience=&amp;lt;app_client_id&amp;gt;,Issuer=&amp;lt;sso_issuer_url&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Issuer&lt;/strong&gt; points to the &lt;strong&gt;AWS AD / Identity Center&lt;/strong&gt; OIDC endpoint.
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Audience&lt;/strong&gt; matches your app's client ID.
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;AWS WAF&lt;/strong&gt; rules to protect from abuse and injection attempts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2 — ALB + OIDC Authentication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use an &lt;strong&gt;Application Load Balancer (ALB)&lt;/strong&gt; to authenticate directly via OIDC before routing to your backend.&lt;/li&gt;
&lt;li&gt;Add group-based routing:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash

  condition:
    Field: path-pattern
    Values: /admin/*
    Authenticate: groups = AI_Admins

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧱 Step 3: Build an Auth Proxy for LangChain
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;Flask/FastAPI proxy&lt;/strong&gt; ensures your AI backend remains isolated and safe.&lt;br&gt;&lt;br&gt;
This layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies AD-based JWT tokens
&lt;/li&gt;
&lt;li&gt;Performs &lt;strong&gt;rate limiting&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sanitizes &lt;strong&gt;user prompts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Logs usage metadata for auditing
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python

from flask import Flask, request, jsonify
import jwt, requests

app = Flask(__name__)
ISSUER = "https://YOUR_SSO_DOMAIN.awsapps.com/start"
AUDIENCE = "LangChainApp"

def verify_token(token):
    # Validate token with AWS OIDC public keys (jwks)
    return jwt.decode(token, options={"verify_aud": True, "verify_iss": True}, audience=AUDIENCE, issuer=ISSUER)

@app.route("/api/query", methods=["POST"])
def handle_query():
    auth_header = request.headers.get("Authorization", "")
    if not auth_header:
        return jsonify({"error": "Missing Authorization"}), 401

    token = auth_header.split(" ")[1]
    claims = verify_token(token)
    user = claims.get("email")

    # Simple prompt validation
    prompt = request.json.get("prompt", "")
    if "DROP TABLE" in prompt.upper():
        return jsonify({"error": "Invalid input detected"}), 400

    # Forward safely to LangChain backend
    resp = requests.post("http://langchain-service/internal-query", json={"prompt": prompt, "user": user})
    return jsonify(resp.json()), resp.status_code

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧰 Step 4: Secure AWS Resources via IAM &amp;amp; KMS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;IAM Roles for Service Accounts (IRSA)&lt;/strong&gt; if deploying LangChain on &lt;strong&gt;EKS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Store model keys, vector DB credentials, and LLM API tokens in &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Encrypt all sensitive data and embeddings with &lt;strong&gt;AWS KMS&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Step 5: Enforce Responsible AI Practices
&lt;/h2&gt;

&lt;p&gt;Security isn't just about access — it's about usage integrity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Log all model invocations with user identity (but mask sensitive input)&lt;/li&gt;
&lt;li&gt;✅ Detect abnormal query patterns with &lt;strong&gt;CloudWatch metrics&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Quarantine or sandbox untrusted user prompts&lt;/li&gt;
&lt;li&gt;✅ Integrate &lt;strong&gt;GuardDuty + Security Hub&lt;/strong&gt; for continuous compliance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧩 Step 6: Continuous Monitoring &amp;amp; Auditing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enable &lt;strong&gt;AWS CloudTrail&lt;/strong&gt; for every API and role assumption.
&lt;/li&gt;
&lt;li&gt;Store all model interaction logs in &lt;strong&gt;S3 with object-level encryption&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Automate review dashboards using &lt;strong&gt;QuickSight&lt;/strong&gt; or &lt;strong&gt;Grafana on CloudWatch logs&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Summary Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control Area&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSO Identity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Integrated AWS AD with IAM Identity Center&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API Gateway / ALB JWT authorizer enabled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secrets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stored in Secrets Manager + KMS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA-enabled pods with least-privilege IAM roles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Input sanitization, rate-limiting, and proxy layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GuardDuty, CloudTrail, and CloudWatch integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Conclusion
&lt;/h2&gt;

&lt;p&gt;By combining &lt;strong&gt;AWS Active Directory SSO&lt;/strong&gt;, &lt;strong&gt;IAM&lt;/strong&gt;, and &lt;strong&gt;LangChain architectural hardening&lt;/strong&gt;, you achieve a &lt;strong&gt;zero-trust AI deployment&lt;/strong&gt; — where &lt;strong&gt;authentication, authorization, encryption, and accountability&lt;/strong&gt; are baked into every step of model access.&lt;/p&gt;

&lt;p&gt;This design keeps your AI APIs secure, your credentials protected, and your compliance auditors happy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by &lt;a href="https://www.linkedin.com/in/chandrani-mukherjee-usa-nj/" rel="noopener noreferrer"&gt;Chandrani Mukherjee&lt;/a&gt;,&lt;br&gt;&lt;br&gt;
Senior Solution Enterprise Architect | AI/ML Specialist&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>langchain</category>
      <category>python</category>
    </item>
    <item>
      <title>Streamlining Qwen: Containerized AI with Docker &amp; Kubernetes</title>
      <dc:creator>Chandrani Mukherjee</dc:creator>
      <pubDate>Tue, 23 Sep 2025 04:23:15 +0000</pubDate>
      <link>https://dev.to/moni121189/streamlining-qwen-containerized-ai-with-docker-kubernetes-41e1</link>
      <guid>https://dev.to/moni121189/streamlining-qwen-containerized-ai-with-docker-kubernetes-41e1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Deploying large language models like &lt;strong&gt;Qwen&lt;/strong&gt; can be resource-intensive and environment-dependent. By using &lt;strong&gt;Docker&lt;/strong&gt;, we can containerize the Qwen model for consistent, reproducible, and scalable deployments across different systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Dockerize Qwen?
&lt;/h2&gt;

&lt;p&gt;Docker provides several advantages when running AI models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility&lt;/strong&gt;: Ensures the same environment everywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt;: Deploy on any system with Docker installed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Easier integration with orchestration tools like Kubernetes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Keeps dependencies separated from the host system.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Steps to Dockerize Qwen
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Create a Dockerfile
&lt;/h3&gt;

&lt;p&gt;A sample Dockerfile for Qwen might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use an official PyTorch image as a base&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime&lt;/span&gt;

&lt;span class="c"&gt;# Set working directory&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Install system dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; git

&lt;span class="c"&gt;# Copy project files&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="c"&gt;# Install Python dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;     pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Expose the API port&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8000&lt;/span&gt;

&lt;span class="c"&gt;# Start the model service&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "serve_qwen.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. Build the Docker Image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; qwen-model:latest &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  3. Run the Container
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 qwen-model:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start the Qwen model server inside a container, accessible on port 8000.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Using Docker Compose (Optional)
&lt;/h3&gt;

&lt;p&gt;For more complex setups, you can use &lt;strong&gt;docker-compose.yml&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.9"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;qwen&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/app/data&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;GPU-enabled Docker images&lt;/strong&gt; for better performance.&lt;/li&gt;
&lt;li&gt;Keep model weights in &lt;strong&gt;mounted volumes&lt;/strong&gt; for easier updates.&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;healthcheck&lt;/strong&gt; in Docker to monitor container status.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;environment variables&lt;/strong&gt; for configuration.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By dockerizing the &lt;strong&gt;Qwen model&lt;/strong&gt;, you can simplify deployment, ensure reproducibility, and scale more effectively across cloud or on-premise environments. This approach makes it easier for teams to share, deploy, and manage AI workloads.&lt;/p&gt;

</description>
      <category>python</category>
      <category>docker</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
