<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ahmet Özel</title>
    <description>The latest articles on DEV Community by Ahmet Özel (@ahmetozel).</description>
    <link>https://dev.to/ahmetozel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3974146%2F2dac28ed-ab5f-446a-b9aa-5b4065b83498.jpeg</url>
      <title>DEV Community: Ahmet Özel</title>
      <link>https://dev.to/ahmetozel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ahmetozel"/>
    <language>en</language>
    <item>
      <title>Classical RAG vs Agentic RAG: a practical decision guide</title>
      <dc:creator>Ahmet Özel</dc:creator>
      <pubDate>Mon, 08 Jun 2026 14:07:52 +0000</pubDate>
      <link>https://dev.to/ahmetozel/classical-rag-vs-agentic-rag-a-practical-decision-guide-6g</link>
      <guid>https://dev.to/ahmetozel/classical-rag-vs-agentic-rag-a-practical-decision-guide-6g</guid>
      <description>&lt;p&gt;"Should I use RAG or an agent?" comes up in almost every LLM project I work on. The honest answer is that they are not competing choices. Classical RAG and agentic RAG sit on a spectrum, and picking the wrong end of it either wastes money or gives you weak answers. This post is a practical way to decide, based on a guide and demo I put together.&lt;/p&gt;

&lt;p&gt;Repo with runnable code: &lt;a href="https://github.com/ahmet-ozel/rag-architecture-guide" rel="noopener noreferrer"&gt;https://github.com/ahmet-ozel/rag-architecture-guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Classical RAG in one paragraph
&lt;/h2&gt;

&lt;p&gt;Classical RAG is a fixed pipeline: embed the query, retrieve the top-k chunks from a vector store, stuff them into the prompt, and generate an answer. One retrieval, one generation. It is cheap, fast, and predictable. For a knowledge base where the answer lives in one or two documents, this is usually all you need, and adding anything more just increases latency and cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic RAG in one paragraph
&lt;/h2&gt;

&lt;p&gt;Agentic RAG hands control to the model. Instead of a fixed pipeline, the LLM decides what to do: reformulate the query, retrieve, check whether the result is good enough, retrieve again from a different source, call a tool, and only then answer. It can loop. This is far more powerful for hard questions, but it is slower, costs more tokens, and is harder to make deterministic.&lt;/p&gt;

&lt;h2&gt;
  
  
  A decision tree that works in practice
&lt;/h2&gt;

&lt;p&gt;Start simple and only add complexity when the data forces you to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is the answer usually contained in a single chunk or document? Use classical RAG.&lt;/li&gt;
&lt;li&gt;Does answering require combining information from several documents or steps of reasoning? Lean agentic.&lt;/li&gt;
&lt;li&gt;Do you need to query multiple sources (a vector DB, a SQL table, an external API) to answer? Agentic, because the model needs to choose tools.&lt;/li&gt;
&lt;li&gt;Are latency and cost tight constraints (high traffic, user-facing)? Bias toward classical, and only escalate to an agent for the queries that actually need it.&lt;/li&gt;
&lt;li&gt;Can you tolerate non-deterministic behavior? If not, classical with strong retrieval beats an agent that occasionally loops in unexpected ways.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A pattern I like: run classical RAG first, and if a confidence or self-check step says the retrieved context is weak, escalate that single query to the agentic path. Most queries stay cheap; only the hard ones pay the agent tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part everyone skips: evaluation
&lt;/h2&gt;

&lt;p&gt;Neither approach means anything without measurement. Before you argue about architecture, build an eval set of real questions with known good answers. Then track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval quality: are the right chunks being retrieved at all? (recall@k, hit rate)&lt;/li&gt;
&lt;li&gt;Answer quality: faithfulness (is the answer grounded in the retrieved context?) and relevance.&lt;/li&gt;
&lt;li&gt;Cost and latency per query, so you can see what agentic behavior actually costs you.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most "RAG is bad" complaints I see are actually retrieval problems: bad chunking, wrong embedding model, or no reranking. Fixing retrieval often beats switching to an agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the demo covers
&lt;/h2&gt;

&lt;p&gt;The repo walks through both architectures end to end with ChromaDB for vector search and works across OpenAI, Gemini, Claude, Ollama, and vLLM, so you can run it fully local or against a hosted model. It includes the chunking and retrieval steps, the agentic tool-selection loop, and the evaluation metrics so you can compare the two on your own data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Default to classical RAG. Add agentic behavior when your questions genuinely need multi-step reasoning or multiple sources, and measure the cost when you do. Architecture is a dial, not a switch.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ahmet-ozel/rag-architecture-guide" rel="noopener noreferrer"&gt;https://github.com/ahmet-ozel/rag-architecture-guide&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you deciding between fixed pipelines and agentic retrieval in production? I am especially curious where people draw the line on cost.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>Building an agentic Jira automation platform with MCP and Temporal</title>
      <dc:creator>Ahmet Özel</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:28:21 +0000</pubDate>
      <link>https://dev.to/ahmetozel/building-an-agentic-jira-automation-platform-with-mcp-and-temporal-1521</link>
      <guid>https://dev.to/ahmetozel/building-an-agentic-jira-automation-platform-with-mcp-and-temporal-1521</guid>
      <description>&lt;p&gt;Most "AI automation" demos fall apart the moment a workflow needs to run longer than a single request. An agent makes a few tool calls, the process crashes or times out, and you lose all state. I wanted something that could drive real, multi-step work inside Atlassian (Jira and Confluence) and survive restarts, retries, and failures. So I built an open-source platform around two ideas: MCP for tool access and Temporal for durable execution.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ahmet-ozel/atlassian-ai-workflow-platform" rel="noopener noreferrer"&gt;https://github.com/ahmet-ozel/atlassian-ai-workflow-platform&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with one-shot agents
&lt;/h2&gt;

&lt;p&gt;A typical agent loop looks like: read a ticket, decide on an action, call a tool, repeat. This is fine for short tasks. It breaks down when a workflow spans minutes or hours, depends on external systems that fail intermittently, or needs to be resumed after a deploy. If your orchestration lives in a single Python process, any crash means you start over. For business workflows that touch real Jira issues, that is not acceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP for tools
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) standardizes how an agent discovers and calls tools. Instead of hard-coding Jira API calls into the agent, I expose Jira and Confluence as MCP tools. The agent sees a clean, typed tool surface (create issue, transition status, search, comment, fetch a Confluence page) and the protocol handles the wiring.&lt;/p&gt;

&lt;p&gt;The practical benefit is decoupling. I can add or change tools without touching the agent logic, and the same tools work with any MCP-compatible client. It also keeps the agent prompt focused on intent rather than API mechanics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Temporal for orchestration
&lt;/h2&gt;

&lt;p&gt;Temporal gives you durable workflows. The workflow code looks like ordinary Python, but every step is checkpointed. If a worker dies, the workflow resumes from the last completed step on another worker. Retries, timeouts, and backoff are declarative.&lt;/p&gt;

&lt;p&gt;This maps perfectly onto agent workflows. Each LLM call and each tool call becomes a Temporal activity. If an LLM provider rate-limits you or a Jira call fails, Temporal retries that single activity instead of replaying the whole reasoning chain. Long-running approvals (wait for a human to review before transitioning a ticket) become a normal part of the workflow instead of a hack.&lt;/p&gt;

&lt;p&gt;The tradeoff is added infrastructure. Temporal is one more service to run, and you have to think in terms of deterministic workflow code versus side-effecting activities. For short, stateless tasks it is overkill. For anything that has to be reliable, it pays for itself quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The stack ties together a few pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An MCP integration layer that exposes Atlassian tools to the agent&lt;/li&gt;
&lt;li&gt;Temporal workers that run the durable workflows and activities&lt;/li&gt;
&lt;li&gt;A webhook gateway that turns Jira events into workflow triggers&lt;/li&gt;
&lt;li&gt;An admin dashboard plus a Streamlit UI for running and inspecting workflows&lt;/li&gt;
&lt;li&gt;Multi-provider LLM support (OpenAI, Anthropic, Gemini, and self-hosted vLLM)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything runs in a single Docker Compose stack, so you can bring the whole system up locally and see the moving parts together. Provider choice is config-driven, which makes it easy to swap a hosted model for a local one during development.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;Separating "what to do" from "how to survive doing it" was the key insight. The agent reasons about intent and picks tools. Temporal owns reliability. MCP owns the tool boundary. Keeping those three responsibilities apart made each one much simpler to reason about and test.&lt;/p&gt;

&lt;p&gt;The other lesson: deterministic workflow code is a discipline. Anything non-deterministic (network calls, timestamps, random values) has to live in an activity, not the workflow body. Once that clicked, debugging got a lot easier because the workflow history is a precise, replayable log of what happened.&lt;/p&gt;

&lt;p&gt;It currently targets Atlassian, but the tool layer is designed to extend to other platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feedback welcome
&lt;/h2&gt;

&lt;p&gt;I would like to hear how others handle long-running agent workflows. Are you using Temporal, a queue plus your own state machine, or a custom orchestration loop? And for MCP users: how are you structuring tools when one agent needs access to several systems at once?&lt;/p&gt;

&lt;p&gt;Repo and setup instructions: &lt;a href="https://github.com/ahmet-ozel/atlassian-ai-workflow-platform" rel="noopener noreferrer"&gt;https://github.com/ahmet-ozel/atlassian-ai-workflow-platform&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
