<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anton Resnick</title>
    <description>The latest articles on DEV Community by Anton Resnick (@softwarebuilding).</description>
    <link>https://dev.to/softwarebuilding</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936739%2F532338f0-3c1b-4237-8ff2-78c07a85ae8d.png</url>
      <title>DEV Community: Anton Resnick</title>
      <link>https://dev.to/softwarebuilding</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/softwarebuilding"/>
    <language>en</language>
    <item>
      <title>What Is Retrieval-Augmented Generation? A Buyer's Guide to RAG in Production</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/what-is-retrieval-augmented-generation-a-buyers-guide-to-rag-in-production-2eop</link>
      <guid>https://dev.to/softwarebuilding/what-is-retrieval-augmented-generation-a-buyers-guide-to-rag-in-production-2eop</guid>
      <description>&lt;p&gt;Every AI application that needs to answer questions about your specific business — your documentation, your contracts, your customer history, your internal wiki, your product catalog — eventually arrives at the same wall. The base language model does not know any of that. Asked about your refund policy, it confidently invents one. Asked about a customer's account history, it confidently invents that too. The hallucinations are not a bug in the model; they are the model behaving exactly as designed against a context that does not contain the answer.&lt;/p&gt;

&lt;p&gt;The standard solution to this problem is called retrieval-augmented generation, or RAG. The original paper proposing the technique was published in 2020 by a team at Facebook AI Research, and the architecture has become the default shape for most production AI applications that need to ground their answers in private or proprietary data. RAG is not the only solution and it is not always the right one — fine-tuning, long-context prompting, and agentic retrieval are all real alternatives — but it is the most-used and best-understood pattern in 2026, and the one most AI buyers will end up procuring at some point.&lt;/p&gt;

&lt;p&gt;This post is the plain-English version. We will cover what RAG actually is, what problem it solves, how a production RAG system is structured, when it is the right call, when it is not, the four failure modes that kill RAG projects before they ship, and a production checklist you can take into procurement. The goal is to give a non-technical buyer enough vocabulary to ask the right questions of any AI agency claiming to build with RAG, and enough framework to recognize a good answer when they hear one.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG in one paragraph for a CEO
&lt;/h2&gt;

&lt;p&gt;Retrieval-augmented generation is a two-step pattern. Step one: before the language model answers a question, a separate retrieval system looks through your private data and pulls back the few most relevant chunks. Step two: those chunks get inserted into the model's prompt alongside the original question, and the model generates an answer grounded in what it just saw. The model is not trained on your data; the model is given your data fresh at every turn. This solves the hallucination problem (the answer cites real text from your sources), the freshness problem (today's data is in the answer because today's retrieval found it), and most of the cost problem (you do not have to retrain the model when your data changes). The trade-off is that the quality of the answer depends on the quality of the retrieval — if the retrieval misses, the model has nothing real to work with and falls back on its priors. Most RAG project failures are retrieval failures, not model failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem RAG actually solves
&lt;/h2&gt;

&lt;p&gt;Three problems, really, and you should understand which one matters for your situation because the answer changes whether RAG is the right architectural choice or whether one of the alternatives wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The hallucination problem
&lt;/h3&gt;

&lt;p&gt;Base language models generate plausible-sounding text. When asked about a topic the model genuinely knows from its training data, the output is usually accurate. When asked about a topic the model does not know — anything specific to your business, anything written after the model's training cutoff, anything proprietary — the model still generates plausible-sounding text, and that text is often confidently wrong. The model has no internal flag for "I don't know." RAG addresses this by inserting real source text into the prompt, so the model's answer is grounded in something verifiable. The hallucination rate does not drop to zero, but it drops by a meaningful order of magnitude, and the answers become citable — the user can click through to the source paragraph that supported a given claim.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The freshness problem
&lt;/h3&gt;

&lt;p&gt;Frontier language models have training cutoffs measured in months. Anything that happened after the cutoff is invisible to the base model. For a customer support assistant, that means yesterday's product update is invisible. For a sales research agent, that means this morning's earnings call is invisible. Fine-tuning the model on fresh data is expensive, slow, and has to be repeated every time the data changes. RAG solves this by separating the retrieval from the model: the model stays the same, but the retrieval system pulls from data updated as recently as the last sync, often minutes-old. The retrieval index is cheap to update; the model never needs to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The cost-and-scale problem
&lt;/h3&gt;

&lt;p&gt;Modern frontier models support very long context windows — hundreds of thousands of tokens, sometimes millions. In theory you could paste your entire knowledge base into the prompt at every request. In practice this is expensive (you pay per token on every call) and slow (long contexts increase latency). RAG retrieves only the few chunks actually relevant to the current question, which keeps each model call short, fast, and cheap. The retrieval side does cost something — you maintain a vector database or search index — but it is a one-time cost per document, not per query, which is the right side of the cost curve to be on.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a production RAG system is structured
&lt;/h2&gt;

&lt;p&gt;A working RAG system has five distinct layers. Each one is a real engineering decision; getting any of them wrong is a common cause of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The ingestion pipeline
&lt;/h3&gt;

&lt;p&gt;Your raw data — PDFs, web pages, database rows, Notion pages, Confluence wikis, customer support transcripts, internal Slack channels — has to be normalized, cleaned, and broken into chunks. Each chunk gets converted into a numeric representation called an embedding by a separate small model (OpenAI's text-embedding-3, Cohere Embed, the Voyage AI family, or open-weight options like nomic-embed-text). The embeddings are stored in a vector database — Pinecone, Weaviate, pgvector, Qdrant, Chroma — alongside the original chunk text and metadata. The ingestion pipeline runs once per document; it runs again whenever the document changes. Chunking strategy (how large each chunk is, where the boundaries fall, what context overlaps between chunks) is one of the highest-leverage decisions in the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The retriever
&lt;/h3&gt;

&lt;p&gt;When a user asks a question, the retriever's job is to find the few chunks most relevant to the question. The standard approach: convert the user's question into an embedding using the same model used during ingestion, then find the closest stored embeddings using a vector similarity search. The top 5-20 chunks come back as candidates. Pure vector search works surprisingly well as a baseline, but most production systems supplement it with traditional keyword search (BM25, the algorithm under classical search engines) and combine the two — a pattern called hybrid retrieval that consistently beats either approach alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The reranker
&lt;/h3&gt;

&lt;p&gt;The top 20 chunks from the retriever are candidates, not winners. A separate reranker model — usually a small cross-encoder model that can compare each candidate chunk against the query in detail — scores them more carefully and picks the top 3-5 to actually feed to the language model. Skipping the reranker is one of the most common reasons RAG systems give mediocre answers in early prototypes: the retriever's top result is often less relevant than the third or fifth result, and without a reranker you never know.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The generator
&lt;/h3&gt;

&lt;p&gt;The final chunks plus the original question get formatted into a prompt and sent to a language model (Claude, GPT, Gemini, or an open-weight model). The model generates an answer grounded in the chunks. Prompt design matters a lot here — instructing the model to cite specific chunks, to refuse to answer if the chunks do not contain the relevant information, and to indicate confidence levels are all common and useful patterns. The model also returns citations, which the application surfaces to the user as links to the underlying source documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The evaluation and guardrails layer
&lt;/h3&gt;

&lt;p&gt;Production RAG systems run continuously, on data that drifts over time, and they need a way to catch quality regressions. The eval layer holds a curated set of test questions with known good answers, runs the full RAG pipeline against them on every deployment, and scores the answers on relevance, factual grounding, and citation quality. Guardrails — content filters, PII detection, off-topic refusals — sit alongside the eval layer and prevent the model from saying things it should not. Skipping this layer is the surest way to end up with a system that worked great in the demo and is silently wrong in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG vs the alternatives
&lt;/h2&gt;

&lt;p&gt;Three other approaches solve overlapping problems and a buyer should know what each one does well. Picking RAG when one of these other patterns is the right answer is a common and expensive mistake.&lt;/p&gt;

RAG vs fine-tuning vs long-context vs agentic retrieval — when each one wins.| Approach | How it works | Best for | Where it breaks |&lt;br&gt;
| --- | --- | --- | --- |&lt;br&gt;
| RAG | Retrieve relevant chunks at query time, insert into the prompt, generate an answer. | Question-answering over private/proprietary data that changes frequently. Citable answers. Lowest cost-per-query at scale. | When the answer requires reasoning across many disparate chunks, when retrieval misses, when chunks are too coarse-grained. |&lt;br&gt;
| Fine-tuning | Retrain the model on your data so it knows your domain natively. | Style, tone, format, and domain-specific reasoning patterns that no prompt can teach. Specialized vocabulary. | Knowledge that changes — every refresh requires retraining. Cost and latency of training. Hard to update. |&lt;br&gt;
| Long-context prompting | Paste the full document into the model context, ask the question, let the model handle retrieval implicitly. | One-off analysis of long documents (contracts, research papers, transcripts). Cases where the entire context fits cheaply. | Cost-per-query at scale. Latency on long contexts. Models still drop or hallucinate mid-context for very long inputs. |&lt;br&gt;
| Agentic retrieval | A planning agent decides what to search for, runs multiple retrieval steps, and synthesizes the answer. | Multi-hop questions where the answer requires combining facts found across multiple separate documents. | Latency (multiple retrieval rounds), cost (multiple model calls per question), debugging complexity. |

&lt;p&gt;Most production AI applications end up using a mix. A typical pattern: fine-tune a small model for style and format, layer RAG on top for grounding, and reach for agentic retrieval only when the question genuinely cannot be answered from a single retrieval pass. Long-context prompting is the right call for one-off analysis but a poor default for continuous question answering at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  When RAG is the right call
&lt;/h2&gt;

&lt;p&gt;Five situations where RAG is almost always the right architectural choice.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer support assistants that answer over a knowledge base, product documentation, or ticket history.&lt;/li&gt;
&lt;li&gt;Internal search-and-summarize tools across a company wiki, Slack archive, or document store.&lt;/li&gt;
&lt;li&gt;Sales and research agents that need to ground claims in source material the user can verify.&lt;/li&gt;
&lt;li&gt;Compliance and legal assistants that must cite the specific clause or regulation they are quoting.&lt;/li&gt;
&lt;li&gt;Any application where the underlying data changes frequently enough that retraining a fine-tuned model would be prohibitively expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When RAG is the wrong call
&lt;/h2&gt;

&lt;p&gt;Three situations where reaching for RAG by reflex is the wrong move.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the answer requires reasoning across the entire corpus, not just a few chunks. A summary of "every contract we signed in 2025" is not a RAG problem; it is a batch analysis problem. Long-context or map-reduce patterns win.&lt;/li&gt;
&lt;li&gt;When the data fits in the model's context window cheaply. If your entire knowledge base is 30 pages and you handle 100 queries a day, the cost of pasting the whole thing into every prompt is negligible and the operational complexity of a RAG pipeline is not worth it. Long-context prompting wins until the volume or document set grows.&lt;/li&gt;
&lt;li&gt;When the user does not need citations and the cost of being wrong is low. For some internal-tool use cases, a fine-tuned small model with no retrieval is faster, cheaper, and adequately accurate. The retrieval layer earns its complexity only when grounding actually matters to the user or to the regulator.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The four failure modes that kill RAG projects
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Bad chunking strategy
&lt;/h3&gt;

&lt;p&gt;The single most common cause of mediocre RAG quality is chunks that are too large, too small, or split across logical boundaries. Chunks that are too large dilute the retrieval signal — the right chunk gets buried in noise. Chunks that are too small lose context — the model retrieves the right paragraph but cannot tell what document or section it came from. Chunks split in the middle of a logical unit (a contract clause, a code function, a procedure step) confuse both the retriever and the model. Production-quality RAG systems use chunking strategies tuned to the document type: semantic chunking for prose, structural chunking for code or contracts, and overlapping chunks to preserve context at boundaries. This is a frequent source of "why does the AI give different answers depending on how I phrase the question" complaints from users.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Embedding model mismatch
&lt;/h3&gt;

&lt;p&gt;The embedding model used during ingestion has to match the embedding model used during retrieval — they have to be the exact same model and the same version. Otherwise the numeric representations are not comparable and the retriever returns nonsense. This sounds obvious; in practice we have seen production deployments where someone swapped the embedding model and the system silently degraded for months. Pinning the model version, monitoring it, and rebuilding the index on any deliberate swap is a non-negotiable.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. No evaluation set
&lt;/h3&gt;

&lt;p&gt;Without a curated set of test questions and expected answers, the team has no way to tell whether a tweak to the chunking strategy, the reranker, or the prompt template made the system better or worse. RAG quality changes are non-obvious; an improvement on one type of query often regresses another. Production-grade RAG systems have an eval set of at least 50-200 hand-curated question-answer pairs, run automatically on every deployment, with regressions blocking the merge. Teams that skip this layer ship a system that quietly drifts.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. No reranker
&lt;/h3&gt;

&lt;p&gt;Skipping the reranker is the most common shortcut in early RAG implementations and the most common reason for mediocre answers. The retriever's top 1-2 results are often less relevant than results 3-5. A small cross-encoder reranker — Cohere Rerank, the open-source bge-reranker, or Voyage's reranker — costs a fraction of a cent per query and produces a meaningfully better top-3. Skipping it is a false economy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production checklist
&lt;/h2&gt;

&lt;p&gt;Use this list when evaluating a vendor's RAG architecture or auditing your own.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is there a documented chunking strategy with a rationale for the chunk size and boundary rules, and is it tuned to the document types in the corpus?&lt;/li&gt;
&lt;li&gt;Are embedding model versions pinned, monitored, and tied to the index build pipeline so a swap forces a reindex?&lt;/li&gt;
&lt;li&gt;Is hybrid retrieval (vector + BM25) in place, or is the system relying on pure vector search alone?&lt;/li&gt;
&lt;li&gt;Is there a reranker between the retriever and the generator?&lt;/li&gt;
&lt;li&gt;Is there a curated eval set of at least 50 question-answer pairs that runs automatically on every deployment, with regression thresholds enforced?&lt;/li&gt;
&lt;li&gt;Are answers returned with citations to the source chunks, and is that surfaced to end users?&lt;/li&gt;
&lt;li&gt;Are guardrails in place for PII, off-topic refusals, and known content sensitivity issues?&lt;/li&gt;
&lt;li&gt;Is the cost-per-query and latency-per-query monitored, with thresholds that page someone when they exceed budget?&lt;/li&gt;
&lt;li&gt;Is the system designed to swap the language model, the embedding model, or the vector database as a configuration change rather than a rewrite?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  RAG quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does RAG stand for?
&lt;/h3&gt;

&lt;p&gt;RAG stands for retrieval-augmented generation. The term was introduced in a 2020 paper by Lewis et al. at Facebook AI Research. "Retrieval-augmented" means the language model's input is augmented (extended) by a retrieval system that pulls relevant context from a separate data source at query time. "Generation" refers to the language model producing the final answer using both the retrieved context and the original question.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a vector database for RAG?
&lt;/h3&gt;

&lt;p&gt;Almost always yes for any RAG system with a non-trivial corpus, but "vector database" is a flexible category. Dedicated vector databases (Pinecone, Weaviate, Qdrant, Chroma) are purpose-built for the workload and scale well. Postgres with the pgvector extension is often enough for small-to-mid-size corpora and avoids running a separate piece of infrastructure. For very small corpora, you can keep embeddings in memory and skip the database entirely. The choice should follow the corpus size and the integration constraints of your existing stack, not the popularity of the tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to build a RAG system?
&lt;/h3&gt;

&lt;p&gt;We deliberately avoid quoting numbers on this page because the real cost depends on the corpus size, the integration depth, the freshness requirements, and the evaluation rigor. The cost drivers to think about: ingestion pipeline complexity (PDFs and scanned documents add real work; clean structured data is cheap), embedding model choice (frontier-quality embeddings cost more per token), vector database scale, reranker pricing, and the language model behind the generator. A focused proof-of-concept can ship in 2-4 weeks; a production-grade RAG system with eval, guardrails, observability, and admin tooling is a 4-8 week engagement for a focused first version, with ongoing iteration after that. We give a written proposal at the end of a free strategy call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is RAG going to be obsolete because of long-context models?
&lt;/h3&gt;

&lt;p&gt;No, despite the recurring claim. Long-context models are useful and they shrink the set of applications where RAG is strictly necessary, but the cost-per-query and latency on long contexts at scale still make RAG the right architecture for high-volume question-answering. Even at one-million-token context windows, paying for a million tokens on every query is uneconomic for any system handling more than a few hundred queries a day. The two patterns will coexist for the foreseeable future, with RAG dominant for scaled question answering and long-context prompting dominant for one-off deep analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use an off-the-shelf RAG platform or build custom?
&lt;/h3&gt;

&lt;p&gt;Depends on the maturity of your data and the depth of integration required. Off-the-shelf RAG platforms (Pinecone Assistant, Vectara, Glean, Mendable, several large vendors' RAG-as-a-service offerings) get you to a working prototype in days, and for some use cases that is the end of the project. Custom RAG earns its complexity when the data ingestion is non-trivial, when the corpus is large enough that per-query pricing becomes the largest line item, when you need to swap models freely, or when the integration with your existing systems goes beyond what the platform exposes. We tell prospects to start with an off-the-shelf platform for the prototype and migrate to custom only when the prototype proves enough value to justify the investment.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between RAG and agentic AI?
&lt;/h3&gt;

&lt;p&gt;Different products. RAG is a retrieval pattern: pull relevant context, generate an answer. An AI agent is a system that decides what to do next, calls tools, and observes the result in a loop. They overlap in practice — most modern agents include retrieval as one of their tools — but they are not the same thing. A pure RAG system does not decide anything; it retrieves and generates. An agentic system may use RAG as one capability among many (calling APIs, writing to systems, branching based on intermediate results). For question answering, RAG alone is usually enough. For workflows that require multi-step action, the agent shape is necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to read next
&lt;/h2&gt;

&lt;p&gt;If you want to go deeper than this post does, the linked resources below are the authoritative sources we hand to clients. The original RAG paper is short and readable. Anthropic's evaluation and prompting docs are the best practical guidance we have seen. The LangChain RAG cookbook is the most-cited implementation reference. And if you are evaluating a specific RAG system or weighing a build vs platform decision for your own project, our strategy calls are free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/custom-ai-software-development" rel="noopener noreferrer"&gt;Our custom AI software development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-integration-services" rel="noopener noreferrer"&gt;AI integration services — wire AI into your stack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;AI agents vs. traditional automation: a decision guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2005.11401" rel="noopener noreferrer"&gt;Lewis et al., 2020 — original RAG paper on arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research" rel="noopener noreferrer"&gt;Anthropic — Building effective agents and RAG eval guidance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/embeddings" rel="noopener noreferrer"&gt;OpenAI — Embeddings guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/tutorials/rag/" rel="noopener noreferrer"&gt;LangChain — RAG cookbook and tutorials&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/what-is-retrieval-augmented-generation" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/what-is-retrieval-augmented-generation&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI Automation Agency vs AI Development Agency: What's the Difference?</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sun, 17 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/ai-automation-agency-vs-ai-development-agency-whats-the-difference-26a8</link>
      <guid>https://dev.to/softwarebuilding/ai-automation-agency-vs-ai-development-agency-whats-the-difference-26a8</guid>
      <description>&lt;p&gt;AI automation agencies and AI development agencies look like the same product from the outside. Both promise to bring AI into your business. Both pitch faster operations and lower costs. Both have founders with confident YouTube channels. Underneath, they are two genuinely different businesses serving two different buyers — and confusing them is one of the more expensive mistakes a non-technical founder can make in 2026.&lt;/p&gt;

&lt;p&gt;We are an AI development agency, so the obvious caveat: we have a view here. But the goal of this post is not to convince you that you need a development agency. It is to help you tell the categories apart honestly, including the cases where an automation agency is the right call and a development agency is the expensive over-correction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 60-second version
&lt;/h2&gt;

&lt;p&gt;An AI automation agency stitches together existing tools — usually no-code or low-code platforms like Make, n8n, Zapier, Airtable, Voiceflow, Lindy, Relevance AI — and adds AI steps inside the workflow. The deliverable is a working automation graph running on a third-party runtime. The team is typically 1-10 people, often non-technical or self-taught, and the price point is low-to-mid five figures for a starter engagement.&lt;/p&gt;

&lt;p&gt;An AI development agency writes code. The deliverable is a custom application running on infrastructure the client owns, integrated with the client's real systems, with model selection, evaluation, observability, and ongoing iteration as first-class engineering concerns. The team is typically senior engineers, the engagement is mid-five to mid-six figures per pilot, and the timeline is weeks-to-quarters rather than days-to-weeks.&lt;/p&gt;

&lt;p&gt;Neither is the right answer for every project. The honest decision turns on five dimensions — covered in the comparison table below — and a four-step decision framework after that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where each category really comes from
&lt;/h2&gt;

&lt;p&gt;The AI automation agency category exploded in 2024-2025 mostly because of YouTube. A wave of creators — many genuinely good at the work — popularized the "AAA" playbook: pick a niche, learn Make or n8n with AI nodes, sell five-figure retainers to small businesses, scale to a portfolio. The category sits at the intersection of the no-code movement and the LLM commodity wave. Real value gets shipped; some of the agencies are excellent; the category as a whole has a high variance because the barrier to entry is low.&lt;/p&gt;

&lt;p&gt;The AI development agency category is older — most of these firms existed before generative AI as classical software development shops, AI/ML consultancies, or boutique product studios — and pivoted hard into LLM-era work over the last two years. The barrier to entry is higher because the work requires senior engineering, real DevOps experience, and the operational muscle to run AI systems in production. Variance inside the category is lower than in AAA-land, but the per-engagement cost is also meaningfully higher.&lt;/p&gt;

5-row side-by-side comparison.| Dimension | AI automation agency | AI development agency |&lt;br&gt;
| --- | --- | --- |&lt;br&gt;
| Deliverable | A workflow running on a third-party runtime (Make, n8n, Zapier, Airtable, Voiceflow, Lindy, Relevance AI). You own the workflow definitions; the vendor owns the runtime. | A custom application running on your infrastructure with code in your GitHub. You own everything. |&lt;br&gt;
| Team shape | 1-10 people, often non-technical or self-taught, with strong tooling-fluency in the chosen iPaaS platform. | Senior engineers with production AI experience, plus product / strategy capacity. Smaller team, deeper bench per person. |&lt;br&gt;
| Where it breaks | When the integration is too custom for the no-code platform, when per-task pricing scales above the cost of a real build, or when something fails and there is no observability layer to debug it. | When the scope is small enough that an iPaaS workflow could have done the same job — a $50k engineering build for what a $5k automation graph would have shipped. |&lt;br&gt;
| What happens when something fails | You log into the iPaaS dashboard and look at the failed run. Debugging surface is whatever the vendor exposes. Fixing it usually means tweaking the workflow graph. | You open your observability tool, replay the exact decision chain, and patch the specific failure mode in code. The fix is durable and version-controlled. |&lt;br&gt;
| Best fit | Small businesses, internal-tool automations, marketing operations, lightweight customer support flows, low-risk experiments, and anything where speed-to-prototype matters more than long-run cost or control. | Customer-facing systems, regulated industries, systems handling money or sensitive data, deep integrations with proprietary backends, anything that needs to scale past the iPaaS cost curve, and any AI feature that is core to the product. |
&lt;h2&gt;
  
  
  Where automation agencies legitimately win
&lt;/h2&gt;

&lt;p&gt;Three honest cases where an automation agency is the right product for the project.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small-business operations where the workflow is well-defined and the volume is moderate. A solo founder running an e-commerce store who wants AI-tagged inventory + auto-routed support tickets + drafted reply emails is exactly the AAA sweet spot. Pay 1-2 months of an agency retainer, ship the graph, run it for a year, pay the iPaaS bill, get value.&lt;/li&gt;
&lt;li&gt;Internal tools at companies of any size, when the team's bottleneck is operations rather than product. An AAA can ship a working internal copilot for the sales team in three weeks; a development agency would propose a six-month build. The AAA wins this race on every dimension that matters for the internal tool.&lt;/li&gt;
&lt;li&gt;Prototypes for ideas you have not validated yet. Before committing engineering budget to a custom AI build, having an AAA-style automation graph that gets the idea in front of real users for $5-15k is one of the best buys in 2026.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where the wheels come off
&lt;/h2&gt;

&lt;p&gt;And three honest cases where an AAA approach predictably stalls and a development agency is the right call from day one.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Customer-facing AI that touches money, contracts, medical records, or anything legally meaningful. The cost of being wrong is too high for a setup whose observability layer is whatever Make or n8n happened to expose. You need code-level eval, structured logging, and a rollback story. AAA platforms do not provide that natively.&lt;/li&gt;
&lt;li&gt;Integrations with proprietary internal systems. iPaaS connectors handle common SaaS APIs well and on-prem services badly. The moment your AI needs to read from a custom database, write through a legacy ERP, or authenticate against an internal SSO that has not heard of OAuth, you are gluing duct tape to a tool that wants to be the duct tape.&lt;/li&gt;
&lt;li&gt;Anything where you expect production volume to grow 10x in 12 months. Per-task iPaaS pricing scales fine at 1,000 runs a day and becomes the largest line item on your operating budget at 100,000. A custom build amortizes; an iPaaS bill compounds.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The 4-step decision framework
&lt;/h2&gt;

&lt;p&gt;Run your specific project against these four questions in order. Whichever side gets two or more votes is the side you want.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the cost of being wrong? Low (an internal Slack notification did not fire) — automation agency. High (a customer got the wrong refund) — development agency.&lt;/li&gt;
&lt;li&gt;How custom is the integration? Common SaaS targets only — automation agency. Proprietary internal systems, regulated data flows, or anything an iPaaS connector does not cover — development agency.&lt;/li&gt;
&lt;li&gt;What is the projected volume in 12 months? Modest, predictable, comfortably inside iPaaS pricing tiers — automation agency. 10x or unpredictable growth — development agency.&lt;/li&gt;
&lt;li&gt;Who owns the operational risk after launch? An ops team or a single founder using the iPaaS dashboard — automation agency. A product team that needs the AI feature to behave like the rest of the product — development agency.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Honest examples from our own pipeline
&lt;/h2&gt;

&lt;p&gt;We turn away projects that are better-fit for an automation agency on a regular basis, and we point clients at specific AAA firms when we do. A few recent examples, sanitized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A solopreneur running a $400k/year coaching business wanted an AI assistant to draft follow-up emails from session notes. We told them to hire an AAA — the right shape is a Make graph and a Lindy assistant, not a $40k engineering engagement. They shipped it in two weeks for a four-figure price.&lt;/li&gt;
&lt;li&gt;A mid-market SaaS company wanted to embed an AI copilot into their existing product. The copilot needed to read from their primary Postgres database, share auth with their existing app, and ship inside their iOS/web/Android surface. We took the engagement because no iPaaS could have done it — the integration depth was the whole project.&lt;/li&gt;
&lt;li&gt;A regional dental group wanted AI receptionist coverage across 12 locations. We honestly debated this one with the buyer and concluded a hybrid: an AAA-built voice assistant on Voiceflow for the receptionist front-line, plus a custom integration layer (us) that connected it to their proprietary practice-management system. Best of both categories, fewer dollars total than either alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The "we do both" agencies
&lt;/h2&gt;

&lt;p&gt;Some firms position as both — automation agency for small projects, development agency for larger ones. In our experience this is a real product when the firm is large enough to staff both shapes well (rare), and a marketing claim more than a product when the firm is small (common). If a prospective vendor pitches both, ask which shape they have shipped most often in the last six months. The answer will tell you which one is their actual business.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI automation vs development — quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an AI automation agency?
&lt;/h3&gt;

&lt;p&gt;An AI automation agency builds workflows on top of no-code or low-code platforms — Make, n8n, Zapier, Airtable, Voiceflow, Lindy, Relevance AI — with AI capabilities embedded as steps inside the graph. The deliverable is a running workflow on the platform's runtime, not a custom application. Engagements are typically days-to-weeks and price-point five figures. Best fit: small business operations, internal tools, marketing operations, lightweight customer support, and prototypes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an AI development agency?
&lt;/h3&gt;

&lt;p&gt;An AI development agency writes custom application code that integrates AI capabilities (language models, agents, embeddings, classifiers) into systems the client owns and operates. The deliverable is a working application on the client's infrastructure with code in the client's GitHub. Engagements are weeks-to-quarters and price-point mid-five to mid-six figures per pilot. Best fit: customer-facing AI, regulated workloads, deep integration with proprietary systems, and anything core to the product or business.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is an AI automation agency the same as the "AAA" model on YouTube?
&lt;/h3&gt;

&lt;p&gt;Largely yes — the YouTube AAA movement is the same category. The variance inside it is real, though. Some AAA practitioners are excellent and ship genuine value; others are reselling templates from a course. The barrier to entry is low, which produces both. Pick by case studies and willingness to show real workflows, not by the founder's YouTube subscriber count.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can an AI automation agency build the same things as a development agency?
&lt;/h3&gt;

&lt;p&gt;Up to the iPaaS ceiling, yes — and then no. Within the bounds of what no-code platforms support natively, AAA-built workflows can do impressive work in days. Beyond that ceiling — custom integrations, regulated data flows, production-scale volume, observability and eval needs, multi-tenant deployments — the AAA approach stops working and a real engineering build is the only way through. The boundary is real but not always obvious from the outside of a project, which is why discovery work matters so much in this category.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I tell which one I actually need?
&lt;/h3&gt;

&lt;p&gt;Run the 4-question decision framework above. If the answers point clearly to one side, that is the answer. If they split, you are in the legitimate hybrid zone — and the right play is usually to start with the smaller commitment (AAA prototype) and graduate to a custom build only when the prototype proves enough value to justify it. We tell prospects to do exactly this on a regular basis, including when it means we do not get the engagement.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do next
&lt;/h2&gt;

&lt;p&gt;If you have a specific project in mind and you are not sure which category it wants, the cheapest next step is a free 30-minute strategy call. We will run the framework with you against your specific situation, and we will tell you honestly which shape fits — including telling you to hire an automation agency if that is the right answer. We give referral introductions to specific AAA firms we trust when the fit is wrong for us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;AI agents vs. traditional automation: a decision guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/conversational-ai-solutions" rel="noopener noreferrer"&gt;Conversational AI solutions — voice and chat done right&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-integration-services" rel="noopener noreferrer"&gt;AI integration services — wire AI into your real stack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What an AI agent build actually costs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/ai-automation-agency-vs-development-agency" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/ai-automation-agency-vs-development-agency&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>startup</category>
      <category>business</category>
    </item>
    <item>
      <title>Best Generative AI Consulting Companies (2026)</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/best-generative-ai-consulting-companies-2026-30l3</link>
      <guid>https://dev.to/softwarebuilding/best-generative-ai-consulting-companies-2026-30l3</guid>
      <description>&lt;p&gt;Every list of the best generative AI consulting companies on the internet has the same problem: the company writing the list is on the list. Sometimes first. Sometimes with three paragraphs of self-praise and one terse line for each competitor. We are no exception — we are on this list too. The difference is that we are going to tell you which firm to actually pick for your situation, including telling you to pick a different one when it is the right answer.&lt;/p&gt;

&lt;p&gt;Generative AI consulting in 2026 splits cleanly into three categories, and the right answer for your project depends almost entirely on which category fits the work. Boutique practitioner firms (us, LeewayHertz, Master of Code) are senior teams of 5-50 people, all-in on AI, who scope and ship in the same engagement. Mid-sized specialists (Neurons Lab, The Hackett Group, ITRex) are 100-1,000-person AI-focused shops with deeper bench but more process between the senior people and your project. Big 4 and strategy giants (Accenture, Deloitte, BCG, IBM) are 100,000-person firms where AI is one practice among many — the firm name buys you political cover and slide decks; the build is a separate procurement.&lt;/p&gt;

&lt;p&gt;Below is the list, organized by category, with the honest version of who each firm is best for. At the end is a summary table you can take into your procurement conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 1 — Boutique practitioner firms (5-50 people)
&lt;/h2&gt;

&lt;p&gt;Senior teams, all-in on AI, where the person scoping the work is the person doing it. Best for companies that want a working system in production this quarter and have a champion internally who can make decisions quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. softwarebuilding.ai
&lt;/h3&gt;

&lt;p&gt;Founded 2018, US-based (Miami, FL). Boutique AI development agency focused on agent systems, conversational AI, and AI-native custom software. Strategy and build under one team — no handoffs between consultants who scope and engineers who build. Weekly demos on real software, not slide decks. You own the code, the prompts, the eval set, and the model accounts from day one.&lt;/p&gt;

&lt;p&gt;Best for: founders, ops leaders, and mid-market companies who want production AI inside a single quarter with a clear path from strategy to shipped system. Skip us if you need a Big 4 name on the document for board-level cover, or if your project is genuinely a body-shop staffing problem rather than an architecture problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LeewayHertz
&lt;/h3&gt;

&lt;p&gt;Founded 2007, US/India hybrid. One of the most-cited AI agencies in Google's AI Overview for generative AI consulting queries — and earned the placement. Broad capability across LLM applications, computer vision, blockchain-adjacent AI, and enterprise integrations. Larger than a true boutique (~250+ engineers) but still markedly more specialized than the Big 4.&lt;/p&gt;

&lt;p&gt;Best for: enterprise clients who want a single vendor across multiple AI workstreams, comfortable with offshore-blended staffing. Less ideal if you specifically want all-senior, all-US-based delivery throughout the engagement.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Master of Code Global
&lt;/h3&gt;

&lt;p&gt;Founded 2004, Ukraine/US hybrid. Strong reputation in conversational AI, chatbot platforms, and customer experience automation. They build production conversational AI systems on top of platform layers (Cognigy, Kore, custom LLM stacks) and consult on the messy integration work behind a good conversational rollout.&lt;/p&gt;

&lt;p&gt;Best for: enterprise CX teams looking specifically for conversational AI expertise rather than general AI consulting. Their case studies are weighted heavily toward customer support and contact-center automation, which is either exactly what you want or not what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 2 — Mid-sized AI specialists (100-1,000 people)
&lt;/h2&gt;

&lt;p&gt;Deeper bench than a boutique, more institutional process, more capacity to handle multi-year programs. The trade-off is that senior architects are sold during procurement but rotate off after kickoff in favor of mid-level execution teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Neurons Lab
&lt;/h3&gt;

&lt;p&gt;Founded 2018, UK-based with global delivery. Specialized in regulated industries — financial services, banking, wealth management — where AI deployment has to clear compliance and audit requirements. Their own listicle of top AI firms (which they regularly update) is a useful tertiary signal that they think hard about the comparative landscape.&lt;/p&gt;

&lt;p&gt;Best for: FSIs, banks, and asset managers who need an AI partner that already understands SOC 2, GLBA, and the realities of financial data residency. Skip if your project has no regulatory overlay — you will pay for compliance overhead you do not need.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The Hackett Group
&lt;/h3&gt;

&lt;p&gt;Founded 1991, publicly traded (NASDAQ: HCKT). A research-and-advisory firm that pivoted into AI implementation services in the last few years. Their advantage is the proprietary benchmarking data — Digital World Class metrics — which gives strategy engagements a quantitative spine that pure dev shops cannot match.&lt;/p&gt;

&lt;p&gt;Best for: large enterprises (Fortune 500) who want benchmark-driven strategy combined with implementation capacity. Their pricing reflects the public-company cost structure, which makes them a poor fit for mid-market budgets.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. ITRex Group
&lt;/h3&gt;

&lt;p&gt;Founded 2010, US-based with global delivery. Solid generalist AI consulting and implementation shop. Less specialized than LeewayHertz or Master of Code, more accessible than the Big 4. Active in healthcare, retail, manufacturing, and enterprise integration projects.&lt;/p&gt;

&lt;p&gt;Best for: mid-market enterprises with a clear AI use case and a need for execution capacity. Their generalist positioning is a feature if you want a pragmatic AI partner; less of a feature if you want a firm with deep expertise in your specific industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Category 3 — Big 4 and strategy giants (100,000+ people)
&lt;/h2&gt;

&lt;p&gt;AI is one of dozens of practices inside firms whose primary business is something else (consulting at Accenture, audit and consulting at Deloitte, strategy at BCG, technology services at IBM). You pay for the firm name, the institutional process, the political cover, and a strategy deck. The build is a separate engagement, often with a different team.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Accenture
&lt;/h3&gt;

&lt;p&gt;Founded 1989 (as Andersen Consulting), publicly traded (NYSE: ACN), 800,000+ employees. The largest AI consulting practice in the world by headcount and revenue. Deep partnerships with Microsoft, Anthropic, OpenAI, AWS, and Google Cloud — which is either an advantage (broad capability) or a recommendation bias (partnerships influence advice) depending on your read.&lt;/p&gt;

&lt;p&gt;Best for: Fortune 100 multi-year AI transformation programs where the firm name on the document matters for board-level decisions and shareholder communications. Categorically the wrong choice for a mid-market company with a specific use case and a single-quarter timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Deloitte (AI &amp;amp; Data Strategy practice)
&lt;/h3&gt;

&lt;p&gt;Founded 1845, private partnership, 460,000+ employees. Strong emphasis on AI governance, ethics, and risk frameworks via their Trustworthy AI program. Their AI strategy work is genuinely thoughtful on the policy and risk dimensions — areas where many smaller firms are weaker. Implementation capacity exists but is usually subcontracted to alliance partners.&lt;/p&gt;

&lt;p&gt;Best for: regulated industries and public-sector engagements where AI governance, audit, and risk frameworks are the bottleneck rather than the technology itself. Less ideal when you actually need the system shipped — that work usually flows to other firms.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. BCG (X / QuantumBlack lineage)
&lt;/h3&gt;

&lt;p&gt;Boston Consulting Group's AI practice operates partly through BCG X (their tech-build arm) and partly via standard strategy consulting engagements. McKinsey's QuantumBlack is the closest analog from the other big strategy firm. Either is excellent at the strategy layer — use-case prioritization, ROI modeling, transformation roadmaps — and serviceable at implementation when paired with build partners.&lt;/p&gt;

&lt;p&gt;Best for: C-suite strategic decisions about AI portfolio, capability investment, or transformation pacing. Hire them for the framing, not for the system. The implementation is rarely where they earn their fees.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. IBM Consulting (with watsonx)
&lt;/h3&gt;

&lt;p&gt;Founded 1911 (the company; the consulting arm is newer), publicly traded (NYSE: IBM), 280,000+ employees. Heavily steers projects toward their own watsonx platform, which is a real product but rarely the optimal choice for greenfield generative AI work in 2026. Strong existing presence in enterprise IT, which makes them an easy default for organizations already deep in the IBM stack.&lt;/p&gt;

&lt;p&gt;Best for: existing IBM customers extending into AI on top of an established watsonx footprint. Categorically not the firm to call if you have no prior IBM commitment — the platform alignment will pull recommendations in a direction that is rarely the cheapest or most flexible outcome.&lt;/p&gt;

At-a-glance summary of the 10 firms above.| Firm | Category | Best fit |&lt;br&gt;
| --- | --- | --- |&lt;br&gt;
| softwarebuilding.ai | Boutique practitioner | Production AI in a quarter; strategy + build under one team |&lt;br&gt;
| LeewayHertz | Boutique practitioner | Multi-workstream enterprise AI with offshore-blended delivery |&lt;br&gt;
| Master of Code Global | Boutique practitioner | Conversational AI and contact-center automation specifically |&lt;br&gt;
| Neurons Lab | Mid-sized specialist | AI for financial services and other regulated industries |&lt;br&gt;
| The Hackett Group | Mid-sized specialist | Benchmark-driven strategy + implementation for Fortune 500 |&lt;br&gt;
| ITRex Group | Mid-sized specialist | Generalist execution capacity for mid-market enterprises |&lt;br&gt;
| Accenture | Big 4 / giant | Fortune 100 multi-year AI transformation programs |&lt;br&gt;
| Deloitte | Big 4 / giant | AI governance, risk, and policy frameworks in regulated sectors |&lt;br&gt;
| BCG / QuantumBlack | Strategy giant | C-suite strategy framing and portfolio decisions |&lt;br&gt;
| IBM Consulting | Enterprise giant | Existing IBM customers extending into watsonx-aligned AI |
&lt;h2&gt;
  
  
  How to actually pick (a short framework)
&lt;/h2&gt;

&lt;p&gt;Most procurement teams pick a generative AI consulting firm by sending an RFP to five firms and comparing the responses. The responses all sound similar because consulting firms are good at writing RFP responses. A better framework, in four questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the deliverable I actually need — a strategy document, a working system, or both? Strategy-only deliverables are Category 3 (Big 4) by default. Working systems are Category 1 (boutique). Both-in-one is Category 1 or 2, never Category 3.&lt;/li&gt;
&lt;li&gt;Who is in my organization to consume the deliverable? A 60-page strategy deck is useless without an engineering team to execute it. If you have no internal engineering capacity, do not buy a strategy-only engagement — buy an implementation engagement and let the strategy emerge from the build.&lt;/li&gt;
&lt;li&gt;What is my timeline tolerance — months or quarters? Boutique firms ship pilots in 4-8 weeks. Mid-sized specialists run 3-6 month programs. Big 4 transformation engagements are 12-36 months. Pick the category that matches your patience, not the one that matches your aspirations.&lt;/li&gt;
&lt;li&gt;Do I need the firm name for political cover or for the work? Be honest with yourself. There are legitimate reasons to hire Accenture or Deloitte that have nothing to do with the technical work — board pressure, shareholder optics, regulatory framing. If that is the real driver, hire the Big 4 and stop pretending it is about execution quality. If it is genuinely about execution, look at Categories 1 and 2.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A note on cost
&lt;/h2&gt;

&lt;p&gt;We deliberately do not publish hourly rates or pilot costs on this page because doing so would be dishonest — the real cost is driven by scope, data readiness, integration count, and ongoing iteration shape, not by a per-hour rate. Industry benchmarks: boutique practitioner pilots typically run mid-five to mid-six figures total; mid-sized specialist programs run high-six to low-seven figures across phase one; Big 4 strategy engagements run seven figures for strategy alone, with implementation as a separate procurement at a higher multiple. Every firm in the list will give you a written proposal after a discovery call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generative AI consulting quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is generative AI consulting?
&lt;/h3&gt;

&lt;p&gt;Generative AI consulting is the subset of AI consulting focused on systems that produce new content — text, images, audio, code, structured data — using large language models or related generative models. It overlaps heavily with general AI consulting but has a different center of gravity in 2026: most engagements involve LLM-based agents, retrieval-augmented systems, conversational AI, or generative content workflows. The skills required are also different from classical AI consulting — prompt design, evaluation harnesses, RAG architecture, and model selection across the frontier-vs-open-weight axis matter more than the deep ML modeling work that dominated AI consulting before 2022.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is generative AI consulting different from regular AI consulting?
&lt;/h3&gt;

&lt;p&gt;In day-to-day practice the line is blurry, but the skill mix is different. Classical AI consulting was heavy on data engineering, feature design, model selection from the scikit-learn / TensorFlow / PyTorch family, and statistical evaluation. Generative AI consulting still uses all of that as background but adds prompt engineering, agent architecture, RAG design, evaluation methods specific to language model output, and model-swap discipline across hosted and open-weight LLMs. Most modern AI consulting firms now do both; the distinction matters mostly for buyers trying to confirm the firm has done generative work specifically rather than just classical ML.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are boutique AI consulting firms better than Big 4?
&lt;/h3&gt;

&lt;p&gt;Better at different things. Boutique practitioner firms are better at shipping working systems quickly with senior staffing throughout. Big 4 firms are better at producing strategy documents that carry weight in regulated, high-political-cover environments. If your need is a system in production, boutique is the right call. If your need is a deck the board will sign off on, Big 4 is the right call. Most companies need one of these, not both. The mistake is hiring a Big 4 to ship a system or a boutique to produce political cover — both work poorly out of category.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I hire multiple AI consulting firms in parallel?
&lt;/h3&gt;

&lt;p&gt;Almost never. Multi-firm AI engagements add coordination overhead that usually exceeds the diversity benefit. The exception is a Big 4 + boutique pairing where the Big 4 handles strategy/governance and the boutique handles implementation — this is a legitimate pattern for regulated Fortune 500 engagements. For everyone else, pick one firm, scope the engagement clearly, and let them ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does an AI consulting engagement usually take?
&lt;/h3&gt;

&lt;p&gt;Boutique practitioner: 1-2 weeks for strategy, 4-8 weeks for a production pilot, 4-6 weeks for hardening. End-to-end inside a single quarter. Mid-sized specialist: 4-8 weeks for strategy, 3-6 months for the implementation program, with possible multi-year retainers attached. Big 4: 3-6 months for strategy alone; multi-year for transformation programs. Match your timeline expectation to the firm category, not to your hopes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do next
&lt;/h2&gt;

&lt;p&gt;If you are evaluating generative AI consulting firms for a specific project, the cheapest next step is a free 30-minute call with a few of them and a directly comparable written proposal at the end. We do this; most firms in the list above do something similar. The proposal will tell you more about fit than any list ever can.&lt;/p&gt;

&lt;p&gt;If you want our read on your specific situation — including a candid recommendation about which firm category fits the work, even when it is not us — that is what the strategy call is for. We will tell you to hire a different firm when the fit is wrong, because handing back a misfit project costs us less than failing in delivery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-consulting-services" rel="noopener noreferrer"&gt;Our AI consulting services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agency-vs-in-house" rel="noopener noreferrer"&gt;AI agency vs in-house: which one to hire&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/how-to-choose-an-ai-development-agency" rel="noopener noreferrer"&gt;12 questions to ask any AI agency before hiring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers" rel="noopener noreferrer"&gt;LangChain vs CrewAI vs AutoGen for buyers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.gartner.com/reviews/market/generative-ai-consulting-and-implementation-services" rel="noopener noreferrer"&gt;Gartner — Generative AI Consulting and Implementation Services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.forrester.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;Forrester — AI strategy and services research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey — The state of AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/best-generative-ai-consulting-companies-2026" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/best-generative-ai-consulting-companies-2026&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>business</category>
      <category>startup</category>
      <category>productivity</category>
    </item>
    <item>
      <title>OpenClaw: The Personal AI Agent That Actually Does Things</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/openclaw-the-personal-ai-agent-that-actually-does-things-3ni7</link>
      <guid>https://dev.to/softwarebuilding/openclaw-the-personal-ai-agent-that-actually-does-things-3ni7</guid>
      <description>&lt;p&gt;The first time you watch an AI agent actually do something — clear an inbox, file a pull request, fix a production bug from a Telegram message while you are on a flight — the gap between that and the chatbot you have been using for two years feels like a category change. OpenClaw is one of the clearest examples of that gap shipping today. It is a personal AI agent that runs on your own machine, takes real actions across your real systems, and remembers what it learned from one session to the next.&lt;/p&gt;

&lt;p&gt;This post is a plain-language read of what OpenClaw is, what it does, how it works under the hood, who it is for, and the practical decisions that separate a deployment that earns its keep from one that becomes shelfware in a week. If you have been searching for a credible local AI agent or a serious open-source alternative to hosted assistants, this is the brief we would give a client weighing whether to build on something like OpenClaw or commission a custom system from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenClaw actually is
&lt;/h2&gt;

&lt;p&gt;OpenClaw is an open-source personal AI agent that installs on macOS, Windows, or Linux and runs locally by default. It was started by Peter Steinberger as an independent project and is explicitly not affiliated with Anthropic, even though it can drive Claude as its underlying model. The codebase is Node.js, distributed as an npm package, with a companion macOS menubar app for users who want a native surface instead of the terminal.&lt;/p&gt;

&lt;p&gt;The product positioning is short: "The AI that actually does things." In practice that means OpenClaw is a long-running assistant, not a chat session. It listens on the channels you connect it to (WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and more), it has access to your local file system and shell, and it can run unattended for hours or days at a time. Persistent memory across sessions is built in, so the agent that watered your plants on Tuesday remembers it on Wednesday without you re-briefing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does in practice
&lt;/h2&gt;

&lt;p&gt;Capability claims from the product page, lightly grouped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat-platform reach: WhatsApp, Telegram, Discord, Slack, Signal, iMessage and other messaging apps act as the input/output surface, so you talk to the agent from wherever you already are.&lt;/li&gt;
&lt;li&gt;Browser control: full web automation — opening pages, filling forms, scraping data, navigating multi-step flows that an API would not give you.&lt;/li&gt;
&lt;li&gt;File system access: reading and writing files on the host machine, which makes document processing and report generation first-class.&lt;/li&gt;
&lt;li&gt;Shell command execution: the agent can run real CLI commands, which is what enables the more interesting autonomous scenarios (testing code, opening pull requests, running cron jobs).&lt;/li&gt;
&lt;li&gt;Persistent memory: context survives across sessions and across days. Important because the difference between a useful agent and a forgetful one is whether you have to re-explain your project every Monday.&lt;/li&gt;
&lt;li&gt;50+ integrations: Gmail, GitHub, Spotify, Obsidian, Twitter, Hue lights, and more. The integrations cover the messy long tail of personal/business tooling, not just the obvious API-rich SaaS.&lt;/li&gt;
&lt;li&gt;Custom skills: users can write their own skills, and the agent itself can write skills on the fly — a self-modifying loop, with the obvious tradeoffs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real use cases pulled from public testimonials on the product page include autonomous inbox triage, calendar management, automated flight check-ins, code testing and pull request creation, mass email unsubscription, and even building small websites from a phone. One user said simply, "It's running my company." Another framed it as a replacement for a virtual assistant. Read those testimonials critically — they are testimonials — but the shape of the use cases lines up with the capability list.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;p&gt;OpenClaw is fundamentally a local-first agent runtime. The default architecture runs on your machine, calling out to whichever LLM you have configured: Anthropic Claude, OpenAI GPT, or a local open-weight model. That model choice matters more than people think. A Claude-driven OpenClaw and a local-model OpenClaw are genuinely different products in terms of cost, latency, capability ceiling, and privacy posture. We will come back to that.&lt;/p&gt;

&lt;p&gt;Around the model is an agent loop: the model receives a goal, decides what tool to call, observes the result, and decides the next step. The tools include browser control, the local shell, the file system, and the integration adapters. Memory is layered on top of the loop so that each new session inherits relevant context from prior sessions without burning the entire token budget on it. Skills are user-defined or model-generated routines that the agent can re-use, which is the mechanism that turns a one-off action into a reliable, repeatable one.&lt;/p&gt;

&lt;p&gt;Installation options range from a single-command curl script to an npm global install to a full source build. For a developer, getting from zero to a working agent is genuinely minutes — that is the part of the install story that has been getting attention. The menubar app on macOS is a thoughtful detail; it turns the agent from a process you start in a terminal into something that lives where the rest of your operating system already lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who actually benefits, and who should pass
&lt;/h2&gt;

&lt;p&gt;OpenClaw maps cleanly to three audiences. Technical operators who already work in a terminal benefit the most — they can extend the system, write skills, debug failures, and feel comfortable running an autonomous process on their machine. Builders of multi-agent systems get a useful prior-art runtime to learn from, since it ships several patterns (memory, skills, sandbox shell access) that any production agent eventually needs. And privacy-sensitive users get something rare in 2026: an agent that does not ship every keystroke to a hosted SaaS, because the runtime is local and the model can be local too.&lt;/p&gt;

&lt;p&gt;The audiences who should pass, or at least wait, are organizations with strict change-control or compliance requirements that cannot tolerate a self-modifying agent on production infrastructure, and individual users who want a chatbot rather than an autonomous process running on their machine. Self-modifying skills are exciting and risky in the same breath. The risk is real enough that we would not recommend an unattended OpenClaw deployment with shell access on a regulated workload without serious guardrails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the value really shows up (when deployed correctly)
&lt;/h2&gt;

&lt;p&gt;The deployments we have seen pay off the fastest share four traits.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The right skills, not the most skills. A small library of well-chosen, well-named skills tied to specific business outcomes beats fifty half-built ones. We treat skills the way we treat APIs — versioned, tested, documented — even though the runtime does not force you to.&lt;/li&gt;
&lt;li&gt;Sensible memory hygiene. Long-running memory is the feature that makes the agent useful and the feature that quietly breaks deployments six weeks in when the context gets polluted. A discipline around what gets remembered, what gets summarized, and what gets dropped is non-negotiable.&lt;/li&gt;
&lt;li&gt;Model choice fit to the workload. A coding-heavy agent on Claude or GPT-5 will outperform a local model. A privacy-bound personal agent on a local model will outperform a hosted one for users who would never accept their data leaving the machine. Pick the model after you know the job, not before.&lt;/li&gt;
&lt;li&gt;A real test loop. Agents drift. Skills break when an upstream API changes a schema. The teams that win run a small set of regression scenarios against the agent on a schedule, the same way you would for any production software.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Training, in the OpenClaw sense, is mostly skill design and prompt engineering — the underlying model already knows how to be a general assistant. The work is teaching it about your specific tools, your specific data shapes, and your specific definitions of done. That is closer to onboarding a new contractor than to training a model in the ML sense. It is also where almost all the leverage is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenClaw is not
&lt;/h2&gt;

&lt;p&gt;OpenClaw is not a replacement for a managed agent platform with enterprise SSO, audit logs, role-based access, and SLA-backed uptime. The local-first design that makes it interesting for personal use is the same design that makes it the wrong default for a 500-employee company. If that is your context, OpenClaw is a research signal about where the category is going, not a production answer for next quarter.&lt;/p&gt;

&lt;p&gt;It is also not a turnkey solution for non-technical users despite the testimonials. The single-command install is real, but the first valuable behavior usually requires writing or commissioning custom skills tied to the user's actual tools. The gap between "installed" and "earning its keep" is mostly skill design work.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is OpenClaw free?
&lt;/h3&gt;

&lt;p&gt;OpenClaw itself is open source under a permissive license. The runtime does not charge you to run it. The cost shows up in whichever LLM you point it at — Claude, GPT, or whatever provider you choose — and in any paid integrations you connect. Running it on a local open-weight model can take the model cost to near zero at the price of capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM should I use with OpenClaw?
&lt;/h3&gt;

&lt;p&gt;It depends on the workload. Long-context reasoning, code generation, and complex tool use favor frontier hosted models (Claude or GPT-class). Privacy-bound tasks and offline work favor local open-weight models. For most production deployments we have seen, the right answer is a hybrid: a frontier model for the hard reasoning steps and a smaller local model for high-volume cheap tasks. Pick the model after you know the job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to give an autonomous agent shell access to my machine?
&lt;/h3&gt;

&lt;p&gt;It is safe to the extent that you trust the skills you let it run, the prompts you let it accept, and the supervision you put around it. The same caveats apply to any other automated process with shell access. We strongly recommend running first deployments in a separate user account or container, with a tight allowlist of commands, and with logs that a human reviews until the system has earned trust. Self-modifying skills require an extra layer of review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can OpenClaw replace a virtual assistant?
&lt;/h3&gt;

&lt;p&gt;For some users, for some tasks, in 2026 — yes, in part. For inbox triage, calendar wrangling, recurring reports, and well-bounded research it is already useful. For the parts of a virtual assistant's job that require judgment, relationship management, or accountability for outcomes, no. Treat it as one more capable hire on the team, not a one-for-one swap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I build my company's AI agent on OpenClaw?
&lt;/h3&gt;

&lt;p&gt;Possibly, if you want a transparent runtime you can extend, you have engineers who can own it, and your workload tolerates a local-first architecture. We would still spend the first week mapping your specific workflows to specific skills before committing. The runtime is the easy part. The hard part is the skills, memory hygiene, and integration design — and that work is the same whether you start from OpenClaw or from a blank repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we think about OpenClaw on client projects
&lt;/h2&gt;

&lt;p&gt;When a client asks us about OpenClaw specifically — and it has started coming up — our answer is shaped by what they actually need. For a founder or operator who wants a personal agent that runs their inbox, calendar, and a handful of recurring tasks, OpenClaw is a credible starting point and we will help set it up properly with a skill set tailored to the work. For a company that needs a multi-user agent with audit, observability, and role-based access, we will usually recommend building on a different stack and using OpenClaw as a reference architecture rather than as the production runtime.&lt;/p&gt;

&lt;p&gt;Either way the leverage is the same: skills, memory hygiene, model fit, and a test loop. The runtime is the smaller decision than people expect. If you are weighing a deployment and want a second pair of eyes on whether OpenClaw is the right base for your specific situation, our strategy calls are free and short. We will tell you whether to use it, what to use it for, and what to use instead — even if the answer is "this isn't the right tool for your problem."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/hermes-agent-nous-research-explained" rel="noopener noreferrer"&gt;Hermes Agent by Nous Research — explained&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;AI agents vs. traditional automation: a decision guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers" rel="noopener noreferrer"&gt;LangChain vs CrewAI vs AutoGen for buyers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What an AI agent build actually costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/openclaw-ai-agent-explained" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/openclaw-ai-agent-explained&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Build an AI Agent (Without an ML Team)</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/how-to-build-an-ai-agent-without-an-ml-team-1eo0</link>
      <guid>https://dev.to/softwarebuilding/how-to-build-an-ai-agent-without-an-ml-team-1eo0</guid>
      <description>&lt;p&gt;Almost every guide to building an AI agent assumes you can write Python, read a PyTorch traceback, and have an opinion about embedding models. Most people who want to build one cannot do those things — and importantly, do not need to. In 2026 the tools have caught up enough that a non-engineering founder, an ops leader, or a curious product manager can put a real agent into production with the right framing and the right shortcuts. The skill required is not ML; it is system design.&lt;/p&gt;

&lt;p&gt;This guide is the plain-English version. We will explain what an AI agent actually is (and what it is not), walk through the five parts every agent has, show you the realistic build path for a non-technical builder, flag the common mistakes that kill these projects before week three, and tell you the honest line where hiring help becomes the smarter move.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI agent actually is (in 60 seconds)
&lt;/h2&gt;

&lt;p&gt;An AI agent is a piece of software that can take a goal, decide what to do next, do it, observe what happened, and decide again — over and over — until the goal is done. That is the entire definition. It is not a chatbot, because a chatbot only talks. It is not an automation, because an automation runs a fixed script. It is a system that has its own loop and can act on the world through tools.&lt;/p&gt;

&lt;p&gt;A concrete example: a sales-research agent. You give it a company name. It searches the web for recent news about the company, pulls the CEO's name from LinkedIn, finds the company's funding history on Crunchbase, drafts a personalized email referencing two specific things it found, and either sends the email or queues it for your approval. That is an agent. The same workflow done as a Zapier sequence with hardcoded steps is automation. The same workflow done as a chat where you have to ask each question manually is a chatbot. The difference is who decides what to do next: in the agent case, the agent does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five parts every agent has
&lt;/h2&gt;

&lt;p&gt;Every working AI agent — yours included, once you build it — has the same five parts. Knowing them ahead of time saves you weeks of going in circles.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. A model (the brain)
&lt;/h3&gt;

&lt;p&gt;The language model is the part that reasons about what to do next. In 2026 you have three realistic choices: Anthropic's Claude family, OpenAI's GPT family, or an open-weight model you run yourself (Llama, Mistral, Qwen). For a first agent, pick a frontier hosted model — Claude or GPT — and stop thinking about it. Open-weight models are powerful but the operational cost of running them yourself is not where a first project should spend its energy. You will swap models later; the architecture should make that easy. It is a configuration change, not a rewrite, if you design correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tools (the hands)
&lt;/h3&gt;

&lt;p&gt;Tools are the things the agent can do in the real world: search the web, read a file, call an API, write to a database, send an email, query your CRM. Without tools, the agent is just a model that can talk. With tools, it can actually finish work. The art is in choosing the right tools and writing clear descriptions of what each one does, when to use it, and what it returns. Most first-agent failures are not model failures — they are tool-design failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Memory (the recall)
&lt;/h3&gt;

&lt;p&gt;Memory is what lets the agent remember anything that happened earlier — earlier in the same conversation, earlier in the same workflow, or earlier in the agent's life. There are two kinds, and you need both. Short-term memory is the conversation buffer: what was said in the last 10 turns. Long-term memory is the persistent store: facts the agent learned, user preferences, things it should not repeat. For a first agent, a JSON file or a simple Postgres table is enough long-term memory. Vector databases are useful later; you do not need one to start.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. A control loop (the decision-maker)
&lt;/h3&gt;

&lt;p&gt;The control loop is the code that runs in a circle: get the latest state, ask the model what to do next, do it, observe the result, repeat. Most modern agent frameworks (LangGraph, CrewAI, the OpenAI Agents SDK, Anthropic's tool-use loop) give you a sensible default control loop you can use without modification. The loop has to handle three things gracefully: when the model picks a tool that fails (retry or escalate), when the model loops forever without making progress (exit with an apology), and when the model decides the goal is done (return cleanly).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Observability (the X-ray)
&lt;/h3&gt;

&lt;p&gt;Observability is your ability to look at what the agent did and figure out why. Every model call, every tool call, every decision the model made — logged, replayable, searchable. This is the part most first builders skip and then regret around week three when something goes wrong in production and there is no way to debug it. The minimum is structured logs of every step. The better version is a hosted observability tool (LangSmith, Langfuse, Helicone, Phoenix) that gives you a UI to walk through specific runs. Set this up on day one, not on day twenty.&lt;/p&gt;

&lt;h2&gt;
  
  
  The realistic build path for a non-technical builder
&lt;/h2&gt;

&lt;p&gt;If you can use a spreadsheet, you can build a simple agent in 2026 with the right tools. Here is the sequence that actually works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pick one workflow that is small, clear, and valuable. Not "a chatbot for our website." Something specific: "summarize the day's incoming support tickets and post the summary to Slack at 5pm." A non-technical builder ships their first agent on a workflow they can describe in one sentence.&lt;/li&gt;
&lt;li&gt;Pick a model. Anthropic Claude or OpenAI GPT. Sign up, get an API key, put $20 of credit on it. That is enough for hundreds of test runs of a simple agent.&lt;/li&gt;
&lt;li&gt;Pick a no-code or low-code agent builder for the first version. Make.com, n8n, Zapier with AI steps, or a dedicated builder like Lindy, Relevance AI, or Stack AI. None of these will scale to a serious production system, but all of them will let you ship version 0.1 in an afternoon. The point of v0.1 is to learn what your real requirements are.&lt;/li&gt;
&lt;li&gt;Connect the agent to one or two tools. Start tiny: web search and "send Slack message" is plenty for most starter agents. Resist the urge to wire in everything on day one.&lt;/li&gt;
&lt;li&gt;Run it 50 times against real inputs. Look at what it gets wrong. The first 50 runs are where the actual requirements live — they almost always differ from the requirements you wrote down at the start.&lt;/li&gt;
&lt;li&gt;Decide whether the no-code version is good enough or whether you need a real build. For a personal-productivity agent or an internal-tools agent, the no-code version is often good enough forever. For a customer-facing agent, an agent that touches real money, or an agent that needs to integrate with systems the no-code platform does not support, the no-code version is a prototype; the real build is a software project.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The biggest non-obvious mistake non-technical builders make: starting with the wrong workflow. Pick a workflow where the cost of being wrong is low (internal tooling, personal automation, low-stakes drafts you review before sending). Customer-facing agents that handle money, contracts, or medical information are not first agents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The five common mistakes that kill first agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Wiring in every integration on day one
&lt;/h3&gt;

&lt;p&gt;The temptation is to give the agent access to everything — CRM, email, calendar, Slack, the database, three different APIs. The result is an agent that can do many things badly. Start with one or two integrations, get those reliable, then add more. Reliable beats capable in a v1 system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. No evaluation loop
&lt;/h3&gt;

&lt;p&gt;You cannot ship an agent and trust it without a way to check whether it is getting better or worse over time. The minimum: a small spreadsheet of 20-50 example inputs and the right answers. Run the agent against the list before you change anything, and after. If the score drops, do not deploy. This is not optional; it is the difference between a working system and a slot machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Trusting the model to handle edge cases gracefully
&lt;/h3&gt;

&lt;p&gt;Models hallucinate, models loop, models make up tool names that do not exist, models confidently send the wrong answer. The control loop has to catch and route these failures — into a retry, into a human review, into an apologetic fallback. Designing for the unhappy path is most of the engineering work in a real agent. The model itself is almost never the bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Skipping observability
&lt;/h3&gt;

&lt;p&gt;When (not if) the agent does something wrong in production, you need to be able to replay the exact sequence of decisions that led to the bad outcome. Without logs of every tool call and every model response, you are guessing. With them, you can usually fix the specific failure mode in an afternoon. Pick an observability tool on day one and turn it on before the first deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Building the wrong shape entirely
&lt;/h3&gt;

&lt;p&gt;Some workflows look like agent problems but are actually automation problems. If the workflow is a fixed sequence of steps with predictable inputs and outputs, an agent is overkill — automation is the right shape. We wrote a whole post on this trade-off (linked below). Building an agent for a workflow that should have been automation costs you money, latency, and reliability without earning anything in return.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to hire someone (honestly)
&lt;/h2&gt;

&lt;p&gt;The DIY path is real, and we have seen non-technical founders ship genuinely useful agents in a weekend. But there is a line, and it is worth being honest about where it sits. Hire someone when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent is customer-facing and a wrong answer has real consequences — money moved, contracts signed, medical information given, legal advice implied. The cost of being wrong is the budget you are working against, and DIY tools do not give you the controls to manage it well.&lt;/li&gt;
&lt;li&gt;The integration depth is beyond what no-code platforms support — custom auth flows, on-prem systems, complex data transformations, anything that lives behind an enterprise API gateway.&lt;/li&gt;
&lt;li&gt;You need to swap models cheaply (e.g., move from a hosted frontier model to a self-hosted open-weight model as volume scales). No-code platforms lock you into their model choices.&lt;/li&gt;
&lt;li&gt;You need real observability, real evaluation harnesses, and real version control on prompts and tool definitions. No-code platforms vary wildly here, and most fall short.&lt;/li&gt;
&lt;li&gt;The agent is core to the business — not a side experiment. Core systems should be built by someone who will still be reachable when they break, on infrastructure you own.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you do decide to hire someone, the kind of help matters. A solo freelancer is fine for a focused single-workflow agent. A boutique AI development agency (us, others) is the right call when the build is one of many systems you will deploy, when you want strategy and build under one team, or when the integration surface is non-trivial. A Big 4 firm is rarely the right call for a first agent — that is a different product entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to build an AI agent — quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I build an AI agent without writing code?
&lt;/h3&gt;

&lt;p&gt;Yes, for a first version. In 2026 the no-code agent builders (Lindy, Relevance AI, Stack AI, n8n with AI nodes, Make.com) will let you ship a working agent in an afternoon without writing code. The trade-off is that no-code platforms have ceilings — usually around custom integrations, observability, and the ability to swap models. They are great for prototypes and internal tools, often inadequate for serious production systems. Use them to learn what your real requirements are, then decide whether to graduate to a real build.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which AI model is best for building an agent?
&lt;/h3&gt;

&lt;p&gt;For a first agent, pick a frontier hosted model and move on: Anthropic Claude (any of the current models) or OpenAI GPT (any of the current models). Both handle tool use and multi-step reasoning well. The right answer changes as your project matures — high-volume cheap steps benefit from smaller models, privacy-bound workloads benefit from open-weight models you self-host — but optimizing the model choice on day one is premature. Architect for swap-ability and pick the best model later.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does it take to build an AI agent?
&lt;/h3&gt;

&lt;p&gt;A no-code first version: hours to days. A no-code production-ready version with one or two workflows: 1-2 weeks. A code-built production agent with real integration, evaluation, and observability: 4-8 weeks for the first one, less for subsequent ones because the foundation is reusable. The number that matters is not the build time — it is the iteration time after launch. A good agent gets noticeably better in the first three months as you grow the evaluation set and tune the tool definitions based on real failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to build an AI agent?
&lt;/h3&gt;

&lt;p&gt;We deliberately avoid quoting numbers on this page because the real cost depends on scope, integration count, data readiness, and the cost of being wrong. The dimensions to think about: how many integrations does the agent need, how clean is your data today, how high are the stakes of a wrong answer (which sets your evaluation and observability budget), and whether you want one-time delivery or ongoing iteration. A no-code DIY agent costs you time and a few hundred dollars in model API credits. A professional build is a different category. We give a written proposal at the end of a free strategy call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a vector database to build an AI agent?
&lt;/h3&gt;

&lt;p&gt;Almost certainly not for your first agent. Vector databases (Pinecone, Weaviate, pgvector, Chroma) are useful for retrieval-augmented generation systems where the agent needs to search over a corpus of documents. If your agent's job is something else — calling APIs, drafting content, running multi-step workflows — you can skip the vector store entirely on v1. Many production agents never need one. Add it when you have a concrete need; do not add it because the tutorials all use one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use LangChain, LangGraph, CrewAI, or no framework at all?
&lt;/h3&gt;

&lt;p&gt;For a first agent, use no framework or use whatever framework your no-code platform uses behind the scenes. For a serious production build, LangGraph is the safe default in 2026, CrewAI is the right call for multi-agent role-based designs where the abstraction earns its keep, and pure-code orchestration is correct when a framework would add latency or debugging overhead without earning it back. We have a full post on this comparison written specifically for non-technical buyers (linked below).&lt;/p&gt;

&lt;h2&gt;
  
  
  What to read next
&lt;/h2&gt;

&lt;p&gt;If you got value from this guide, the related posts below dig deeper into the decisions you will face once you start. The AI-agent-vs-automation framework helps you decide whether the workflow you have in mind actually wants an agent. The framework comparison breaks down LangChain vs CrewAI vs AutoGen in business terms. The cost-driver post explains how the bill actually adds up. And our AI agent development service page is the version of this for people who decide the build needs an engineering partner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;AI agents vs. traditional automation: a decision guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers" rel="noopener noreferrer"&gt;LangChain vs CrewAI vs AutoGen for buyers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What drives the cost of an AI agent build&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/openclaw-ai-agent-explained" rel="noopener noreferrer"&gt;OpenClaw: an open-source personal AI agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic — Building effective agents (engineering guidance)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/practices-for-governing-agentic-ai-systems/" rel="noopener noreferrer"&gt;OpenAI — Practices for governing agentic AI systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/how-to-build-an-ai-agent" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/how-to-build-an-ai-agent&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Hermes Agent by Nous Research: The Agent That Grows With Your Server</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/hermes-agent-by-nous-research-the-agent-that-grows-with-your-server-plf</link>
      <guid>https://dev.to/softwarebuilding/hermes-agent-by-nous-research-the-agent-that-grows-with-your-server-plf</guid>
      <description>&lt;p&gt;Most agent products in 2026 are either hosted SaaS (you rent the agent, the vendor owns the runtime) or thin wrappers around a chat model (handy, limited). Hermes Agent, the open-source release from Nous Research, sits in a less crowded category: an autonomous agent designed to live on your own server, build up its own library of learned skills over time, and orchestrate isolated subagents under one parent. The pitch on the product page is short — "The Agent That Grows With You" — and the technical detail underneath it is more interesting than the tagline.&lt;/p&gt;

&lt;p&gt;This post is a plain-language read of what Hermes Agent is, what it does, how it works, who it is for, and the practical decisions that separate a thoughtful deployment from a shelfware one. If you have been hunting for a credible self-hosted AI agent or a serious open-source autonomous AI agent you can actually run on your own infrastructure, this is the brief we would hand a client weighing Hermes against a custom build or a hosted alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Hermes Agent actually is
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is an MIT-licensed, open-source autonomous agent from Nous Research, currently at version 0.14.0 at the time of writing. The headline framing from the project itself: it is "an autonomous agent that lives on your server, remembers what it learns, and gets more capable the longer it runs." In other words it is intended to be installed once on infrastructure you control and improved over time, rather than spun up for a single task and discarded.&lt;/p&gt;

&lt;p&gt;Nous Research is a known name in the open-weight AI world — they are best known for the Hermes line of fine-tuned open models. Hermes Agent is the company's move from "better open models" to "a runtime that makes open agents useful in real environments." That lineage matters, because most open-source agent projects come from labs that do not also ship models, and the design choices in Hermes Agent reflect both sides of that experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does in practice
&lt;/h2&gt;

&lt;p&gt;The capabilities Nous calls out, lightly grouped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-platform interface: Telegram, Discord, Slack, WhatsApp, Signal, Email, and a CLI. The agent meets users where they already are rather than forcing a dedicated UI.&lt;/li&gt;
&lt;li&gt;Auto-generated skills with persistent memory: the agent "learns your projects and never forgets how it solved a problem." Skills accumulate, so a problem the agent has seen before becomes a known recipe.&lt;/li&gt;
&lt;li&gt;Natural-language cron scheduling: "Natural language cron scheduling for reports, backups, and briefings." You tell the agent in English when and what; the schedule is the agent's problem.&lt;/li&gt;
&lt;li&gt;Subagent delegation: "Isolated subagents with their own conversations, terminals, and Python RPC scripts." The parent agent can spin off scoped workers, give them their own environment, and collect results.&lt;/li&gt;
&lt;li&gt;Five sandbox backends: local, Docker, SSH, Singularity, and Modal. You pick the isolation model that fits your security and infrastructure posture — the parent and each subagent can use a different one.&lt;/li&gt;
&lt;li&gt;Rich tool set: web search, browser automation, vision, image generation, text-to-speech, and multi-model reasoning are all first-class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What the homepage does not specify is the underlying LLM. Hermes Agent is model-agnostic by design, and Nous Research has a clear stake in the open-weight side of that decision; expect a Hermes-line model or other open-weight model to be the most idiomatic choice, with hosted frontier models supported for tasks that need the extra capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works under the hood
&lt;/h2&gt;

&lt;p&gt;The architecture splits cleanly into three layers. At the top is the agent runtime — the long-running parent process that owns conversations, schedules, and the skill library. Below it is the sandbox layer, which is where Hermes is genuinely interesting: any task the agent runs can be isolated in a sandbox you chose for that workload. A local sandbox for quick personal work, Docker for repeatable internal jobs, SSH for tasks that need to run on a specific machine, Singularity for HPC environments, Modal for ephemeral cloud compute. Picking sandboxes per task is unusual and unlocks deployments that would be hard to justify on a single-runtime design.&lt;/p&gt;

&lt;p&gt;The third layer is the skill memory. Each problem the agent solves becomes a candidate skill — a recipe with a name and a callable shape. The next time a similar problem appears, the agent reaches for the skill rather than re-deriving the solution. Over weeks and months the skill library is supposed to become the most valuable artifact in the system, much more so than the model weights or the runtime code. That is also where most of the work of a thoughtful deployment lives.&lt;/p&gt;

&lt;p&gt;Subagent delegation is the fourth pillar in practice. The parent agent does not have to do everything itself; it can delegate scoped work to subagents with their own context, their own terminals, and their own RPC channels. That is the pattern most production multi-agent systems eventually need, and shipping it in the runtime saves the implementing team from rolling their own.&lt;/p&gt;

&lt;p&gt;Installation is a single curl command followed by a &lt;code&gt;hermes setup&lt;/code&gt; step that walks the operator through credentials, sandbox choice, and integrations. Nothing exotic — the project clearly wants the first useful behavior to be reachable in under an hour.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who actually benefits, and who should pass
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is most valuable to three audiences. First, engineering teams who want a self-hosted agent for an internal workflow and who already have the infrastructure muscle to run something with shell access on their own servers. Second, organizations with data-residency or compliance constraints that rule out hosted SaaS agents — Hermes runs on your hardware and the sandbox choices give you control over what touches what. Third, builders of multi-agent platforms who want a reference runtime that already ships sandbox isolation and subagent delegation, two patterns that are expensive to rebuild from scratch.&lt;/p&gt;

&lt;p&gt;It is the wrong default for individuals who want a personal agent on their laptop — that is closer to OpenClaw territory — and for companies that want a turnkey hosted product with an SLA and a vendor on the other end of a support email. Self-hosted open source is not a free lunch. Someone is operating the runtime, owning the upgrades, and watching the logs. If that someone does not exist in your organization, factor the cost of building or hiring that someone into the decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the value really shows up (when deployed correctly)
&lt;/h2&gt;

&lt;p&gt;Four traits tend to separate Hermes deployments that pay off from ones that stall.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sandbox discipline. The five-backend design is a gift if you use it deliberately. Map each kind of work to the right sandbox up front — Docker for repeatable internal jobs, SSH for machine-specific work, Modal for cloud bursts — and you avoid the mess where everything runs in the local sandbox and the security review goes badly six months in.&lt;/li&gt;
&lt;li&gt;Skill curation, not skill accumulation. The agent will happily generate skills forever. The teams that get value treat the skill library the way they would treat a shared codebase: named well, reviewed, tested, deprecated when stale. The teams that do not end up with a corrupted memory that drags performance down.&lt;/li&gt;
&lt;li&gt;Subagent boundaries that match real responsibilities. Subagent delegation is powerful only if the subagents have clear scopes. A subagent that does "research" with no defined output shape is worse than no subagent. A subagent that does "return a JSON list of five vetted leads matching this brief" is exactly the right unit.&lt;/li&gt;
&lt;li&gt;Model choice tied to the task. Nous's lineage points toward open-weight models, and many tasks are well-served by a Hermes-class model running on your own GPU. Other tasks — long-context reasoning, complex coding, ambiguous multi-step planning — benefit from a frontier hosted model. Mix them. The runtime supports it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Training the agent, in the practical sense, is a combination of three things: the model behind it, the prompt and tool design at the parent level, and the curated skill library that accumulates over time. The skill library is where most of the long-run value lives. A six-month-old Hermes deployment with a well-curated 40-skill library will outperform a freshly installed one with a more capable model behind it. The compounding is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Hermes Agent is not
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is not a managed SaaS product. There is no hosted dashboard with billing tiers, no vendor SLA, no support contract attached. The MIT license and the GitHub repo are the relationship. Some organizations treat that as a feature; others find it disqualifying. Both reactions are reasonable. Plan accordingly.&lt;/p&gt;

&lt;p&gt;It is also not a finished product. The 0.14 version signals where Nous Research is on the curve — production-credible, actively developing, with API churn still likely. If you build on Hermes today, budget for a small ongoing upgrade tax until the project hits 1.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes Agent quick answers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Hermes Agent free?
&lt;/h3&gt;

&lt;p&gt;The Hermes Agent runtime is MIT-licensed open source — free to use, modify, and deploy. Operating cost depends on three things: the LLM behind it (open-weight models you run yourself can be near zero per inference, frontier hosted models cost what they cost), the compute you give it (a server, GPU, or cloud quota), and the integrations you connect (some of which have their own subscription fees).&lt;/p&gt;

&lt;h3&gt;
  
  
  Which model does Hermes Agent run on?
&lt;/h3&gt;

&lt;p&gt;It is model-agnostic. Given Nous Research's history shipping the Hermes line of fine-tuned open-weight models, the most idiomatic choice is a Hermes-class model running on your own hardware. The runtime also supports hosted frontier models for tasks that demand the extra capability. For most production deployments the right answer is a hybrid: an open-weight model handling high-volume tasks, a frontier model for the hardest reasoning steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is Hermes Agent different from OpenClaw?
&lt;/h3&gt;

&lt;p&gt;Both are open-source autonomous agents, but they are designed for different shapes of work. OpenClaw is a personal agent that runs on your laptop and treats chat platforms as the primary surface. Hermes Agent is a server-resident agent with sandboxed subagent delegation and five deployment backends, aimed at teams that want a self-hosted agent on shared infrastructure. If the deployment unit is one user, OpenClaw is the closer fit. If it is a server that multiple users or workflows call into, Hermes is the closer fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to run an autonomous agent on a server with shell access?
&lt;/h3&gt;

&lt;p&gt;The five sandbox backends are exactly the answer to that question, and they are the most important part of the architecture for any serious deployment. Map each kind of work to the right sandbox, lock down credentials per sandbox, log everything, and review logs until the system has earned trust. Self-hosted does not mean less secure than hosted; in many cases it means more, because the sandbox boundary is yours to enforce rather than someone else's.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should my company build its agent on Hermes Agent?
&lt;/h3&gt;

&lt;p&gt;It is a strong candidate if you want a self-hosted runtime with subagent isolation, you have engineers who can own the deployment, and you are comfortable with a 0.x open-source project as the foundation. It is the wrong choice if you need a hosted SaaS with a vendor SLA, or if the workload is small enough that a managed agent service would be cheaper end-to-end than self-hosting. We map the decision case-by-case on strategy calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we think about Hermes Agent on client projects
&lt;/h2&gt;

&lt;p&gt;When a client asks us whether Hermes Agent is the right base for their build, the honest answer is: it depends on whether they want the runtime to be theirs. Companies that want to own the agent the same way they own their database — running on their infrastructure, with their security boundary — find a lot to like in Hermes. Companies that want a hosted product with a support contract are better served elsewhere. The capabilities are not the deciding factor; the operating posture is.&lt;/p&gt;

&lt;p&gt;Where we add value on a Hermes deployment is in the parts the runtime does not solve for you: mapping real workflows to subagents with clear scopes, designing the initial skill set so it compounds rather than accumulates, choosing sandboxes per task, and wiring observability so failures are visible before they become outages. If you are weighing Hermes against alternatives and want a second pair of eyes, our strategy calls are free. We will tell you whether to use it, what to use it for, and what to use instead — even if the answer is "this isn't the right tool for your problem."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/openclaw-ai-agent-explained" rel="noopener noreferrer"&gt;OpenClaw: the personal AI agent explained&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;AI agents vs. traditional automation: a decision guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers" rel="noopener noreferrer"&gt;LangChain vs CrewAI vs AutoGen for buyers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What an AI agent build actually costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/" rel="noopener noreferrer"&gt;Hermes Agent official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/hermes-agent-nous-research-explained" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/hermes-agent-nous-research-explained&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>programming</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI Agents vs. Traditional Automation: When to Use Each</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/ai-agents-vs-traditional-automation-when-to-use-each-1ema</link>
      <guid>https://dev.to/softwarebuilding/ai-agents-vs-traditional-automation-when-to-use-each-1ema</guid>
      <description>&lt;p&gt;Every business with a backlog of internal workflows is asking the same question right now: do we need an AI agent for this, or is plain automation enough? It looks like a tooling question. It is really a problem-shape question, and most teams pattern-match on the tool instead of on the work. They see a Zapier flow and assume anything more complicated needs an agent. Or they see a chatbot demo and conclude that every workflow is suddenly an agent workflow. Neither shortcut survives contact with production.&lt;/p&gt;

&lt;p&gt;The honest answer is that AI agents and traditional automation are good at different things, and the cost of picking the wrong one is high in both directions. Pick automation for a workflow that needed judgment and you cap the value at the ceiling of your rules. Pick an agent for a workflow that is actually deterministic and you have just paid agent prices for what a low-cost iPaaS subscription could have done at higher reliability. By the end of this post you will have a clear decision framework you can run against any specific workflow you are weighing, plus the patterns we see most often go right and most often go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What automation actually is (and what it is great at)
&lt;/h2&gt;

&lt;p&gt;Traditional automation is a fixed, deterministic sequence of steps. The same input produces the same output, every time, because the rules are codified up front by a human. The tools that do this well are familiar: Zapier, Make, n8n, Workato, native iPaaS connectors, and custom scripts. Plus the embedded automation inside CRMs, marketing platforms, and project management systems. If you can write the workflow as a flowchart and the diamonds (decision points) all check a known field for a known value, you are looking at an automation problem.&lt;/p&gt;

&lt;p&gt;Three concrete examples make this concrete. A lead-form submission triggers a Slack notification, creates a HubSpot record, and starts a welcome email sequence. A daily Stripe webhook ingestion tags churned customers in Postgres based on a payment-failure threshold. A marketing campaign clone-and-modify operation copies the same launch playbook across five ad networks with platform-specific tweaks. None of those need judgment. They need reliability, observability, and someone to fix them when an upstream API changes its schema.&lt;/p&gt;

&lt;p&gt;Automation is great at high-volume, low-variability, business-rule-codified work. It is predictable, cheap to run, and easy to debug because every branch is visible in the flowchart. What it cannot do is handle ambiguity, exercise judgment, recover from unexpected inputs, or plan a multi-step action where step three depends on what happened in step two. The moment you find yourself writing 'if the message contains X or Y or Z, but not Q unless also W,' you have left automation territory and started building a brittle approximation of an agent inside a rules engine. That is usually a signal to step back.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI agents actually are (and where they earn their keep)
&lt;/h2&gt;

&lt;p&gt;An AI agent is a system with a large language model at its core that observes a situation, decides what to do next, uses tools to read your data and write to your systems, and recovers from failure. It has memory of prior steps so it can chain decisions over time. It asks for help when uncertain instead of guessing. Anthropic's engineering team frames agents as LLM systems that dynamically direct their own processes and tool usage — that framing is worth holding in your head because most things sold as agents in 2026 do not pass that bar.&lt;/p&gt;

&lt;p&gt;Three concrete examples again. A customer support agent reads an incoming ticket, looks up the customer's account history in the CRM, checks whether they are inside SLA, decides whether the request can be resolved within policy or needs to escalate to a human, drafts the response, posts it for review or sends it directly depending on confidence, and logs the outcome with reasoning so the next time the same pattern appears the team can audit what happened. A sales research agent takes a target account name, pulls public signals from five different sources, scores fit against an ideal-customer-profile rubric, drafts a custom outreach email referencing specific things that account is doing, and hands it to the SDR with notes on what was found. An operations agent monitors a queue of project tickets, identifies the ones at risk of slipping based on status, owner load, and historical patterns, drafts a status update for the PM with a recommended action, and schedules follow-ups on the items where the next step is clear.&lt;/p&gt;

&lt;p&gt;Critical clarifier, because this is where most projects go sideways: a while-loop calling an LLM is not an agent. A real agent has memory, tools, control flow, and observability. The first one means it carries context across steps. The second means it can read and write your real systems, not just generate text. The third means it can branch, retry, and decide when it is done. The fourth means you can debug it when it gets something wrong, which it will. Most of what gets pitched as an agent today is automation with an LLM call inside one of the steps. Sometimes that is fine and exactly what the workflow needs — but call it what it is.&lt;/p&gt;

Automation vs. AI agent — at a glance| Dimension | Traditional Automation | AI Agent |&lt;br&gt;
| --- | --- | --- |&lt;br&gt;
| Input shape | Structured fields, predictable schema | Unstructured text, ambiguity, mixed formats |&lt;br&gt;
| Decision-making | Fixed if-this-then-that branches | Selects next step based on context and memory |&lt;br&gt;
| Failure mode | Fails loudly when inputs deviate | Degrades gracefully, retries, or asks for help |&lt;br&gt;
| Cost profile | Cheap per execution, costly to maintain at scale | Higher per execution, lower maintenance at scale |&lt;br&gt;
| Best fit | High volume × stable rules | Judgment, exceptions, multi-step reasoning |
&lt;h2&gt;
  
  
  The decision framework: five questions to run against your workflow
&lt;/h2&gt;

&lt;p&gt;Pick a specific workflow you are considering. Not the category, the actual workflow. Then run it through these five questions. They are not hypothetical — they are the same questions we run inside scoping calls before we will quote a build.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Does the workflow involve judgment that varies case-by-case?
&lt;/h3&gt;

&lt;p&gt;If two reasonable humans doing the same workflow would sometimes pick different next steps based on context, you have a judgment workflow. A refund-policy enforcement agent has to weigh customer tenure, prior incident count, the specific reason given, and policy edge cases. Two support agents might rule differently on the same ticket, and both would be defensible. That is an agent problem. By contrast, 'when a payment fails three times, suspend the account' is a rule. There is no variance — the workflow is automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Does the input format vary or arrive in natural language?
&lt;/h3&gt;

&lt;p&gt;Structured inputs (form fields, webhook JSON, database rows) are automation-friendly. Variable inputs (emails written by customers, uploaded PDFs that come from different vendors, voice transcripts, support tickets in five languages) are agent-friendly because parsing meaning out of unstructured text is exactly what LLMs are good at. The dividing line is not whether the input is text. It is whether the format is predictable. A daily CSV export from your warehouse is structured even though it is text. A customer email is not, even if it follows a vague pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Are there five or more decision points where the next step depends on what just happened?
&lt;/h3&gt;

&lt;p&gt;Count the branches in your workflow. If there are one or two, automation handles that gracefully. If there are five or more, and the path through them depends on intermediate results (what we found in step two changes what we look up in step three), you are looking at agent control flow. Trying to encode that as a rules engine produces a maze of nested conditionals that nobody can maintain six months later. Agents handle this naturally because the decision logic lives in the LLM and the prompt, not in a flowchart that grows exponentially with edge cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Do failures need recovery logic, not just retry?
&lt;/h3&gt;

&lt;p&gt;Automation retries the same step. If the API returned a 500, try again in 30 seconds. That is enough for most failures. But some workflows fail in ways where the right response is not retry — it is fall back, ask a question, escalate, or pick a different approach entirely. If the CRM lookup returns no match, do you give up, try a fuzzy match, ask the user to confirm, or proceed without that data? That kind of recovery is what agents are built for. Encoding it as automation rules works until it does not, and then it fails silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Is the cost of a wrong action high enough that you would want a human to check in some cases?
&lt;/h3&gt;

&lt;p&gt;If sending the wrong email is fine, ship automation and move on. If sending the wrong refund is a problem, you want an agent with human-in-the-loop checkpoints — a system that knows when to act, when to ask, and when to defer. Pure automation does not have that capacity built in. You can add manual review gates, but at that point you have a queue with a human pretending to be a control loop. Agents formalize this: confidence above threshold acts, confidence below threshold pauses for review, every decision gets logged.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you answered yes to three or more of these, you have an agent problem. If you answered no to four or more, you have an automation problem. The hardest case is two or three yes answers — that is where most failed AI projects live, and that is where a strategy call earns its keep.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;[Diagram available in the original article — &lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;view on softwarebuilding.ai&lt;/a&gt;]&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in three common workflows
&lt;/h2&gt;

&lt;p&gt;Categories blur when you stay abstract. Three side-by-side patterns make the distinction concrete enough to apply to your own situation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer support
&lt;/h3&gt;

&lt;p&gt;Automation in support looks like ticket routing based on tags, canned-response macros triggered by keyword, and auto-closing tickets that have been idle for 14 days. All useful, all rule-driven, all easy to maintain. Agent territory in support looks like full ticket resolution: reading the message, checking account status and prior ticket history, applying refund policy with judgment about edge cases, drafting a response that references the specific customer's situation, and escalating the cases where the answer is not in the playbook. Automation routes the work; an agent does the work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sales operations
&lt;/h3&gt;

&lt;p&gt;Automation in sales ops enriches leads from a known data source (drop in a domain, pull back firmographics, write to HubSpot). Highly reliable, totally rule-based. Agent territory is ICP-based account research where the inputs are unstructured: the agent takes a target account, pulls signals from press releases, hiring pages, product changes, and recent funding events, scores fit against your specific ICP rubric, and drafts personalized outreach that references the actual signals it found. The same workflow framed as automation collapses to a templated mail merge — useful, but a lower ceiling on response rate by a factor of three or four. Different tool for a different question.&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal ops and project management
&lt;/h3&gt;

&lt;p&gt;Automation here is status sync between Jira and Slack, daily standup digests, and SLA reminders when a ticket is approaching breach. Critical infrastructure, fully deterministic. Agent territory is at-risk-project identification: an agent that watches the queue of projects, identifies which ones are slipping based on owner load, recent comment density, blocker patterns, and historical slip behavior, drafts a stakeholder communication explaining the slip and the recovery plan, and surfaces a recommended action for the PM to approve. The pattern that should stand out: automation handles signals, an agent handles synthesis.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trap: when an 'agent' is really automation with an LLM call
&lt;/h2&gt;

&lt;p&gt;This is where most failed AI projects we audit live. Someone read a Medium post about agents, decided their workflow needs one, and shipped what is actually an automation script with an OpenAI call wedged into one of the steps. It works for the demo. It falls over in production. We wrote a longer post on why most AI projects fail and what to do about it — the patterns repeat across industries and team sizes.&lt;/p&gt;

&lt;p&gt;The specific anti-patterns to watch for in your own builds, or in proposals you are evaluating from agencies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM-call-in-a-script with no memory or recovery — every invocation starts from scratch, and if something fails halfway through, the whole workflow restarts blindly.&lt;/li&gt;
&lt;li&gt;An 'agent' that has no actual tool use — it generates text but cannot read your CRM or write to your database, so a human ends up doing the actions the agent suggested.&lt;/li&gt;
&lt;li&gt;No evaluation harness — there is no way to measure whether the agent is getting better or worse over time, so improvements get shipped on vibes and regressions get caught by customer complaints.&lt;/li&gt;
&lt;li&gt;No observability — when the agent does something weird, nobody can reconstruct what it was thinking, so debugging becomes a guessing game.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of those describe a workflow you are about to ship, you are not shipping an agent. You are shipping a fragile automation with extra latency and a less predictable failure mode. The fix is either to add the missing pieces or to simplify back down to honest automation. Both are valid choices. Pretending the system is something it is not is the failure path.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to decide for your own workflow in 30 minutes
&lt;/h2&gt;

&lt;p&gt;Here is a practical exercise. Take the top three workflows you are considering for AI investment this quarter. For each one, write a single sentence describing what it does. Then run the five questions from the framework against it and write yes or no for each. You will end up with three rows of five answers. The pattern will be obvious in most cases.&lt;/p&gt;

&lt;p&gt;The cases where it is not obvious are the interesting ones. Two yes and three no, or three yes and two no, are the workflows where the right answer often depends on a sixth question that is specific to your business: how much variance does your team tolerate in this workflow today, what does the data look like, and how high-stakes is the downside of getting it wrong. That sixth question is exactly the conversation we have on a strategy call. We scope your specific workflow in 30 minutes and tell you honestly whether you have an agent problem, an automation problem, or a 'you do not actually need AI here, you need a process fix' problem. We have killed enough of our own pitched projects on the third diagnosis that we mean it.&lt;/p&gt;

&lt;p&gt;If you want to walk through your specific workflow with someone who has shipped both kinds of systems, book a free 30-minute strategy call. We move fast — most of our clients go from this call to a working first agent live in weeks, not quarters. You leave with a clearer view of which tool fits, whether or not we ever work together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common follow-up questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is RPA the same as AI agents?
&lt;/h3&gt;

&lt;p&gt;No, RPA (robotic process automation) and AI agents are different tools for different problems. RPA replays human keystrokes and mouse clicks against UIs that lack APIs — it is a screen-scraping shortcut for systems that cannot be integrated cleanly. AI agents have memory, structured tool access, and decision-making — they reason about the task instead of replaying recorded steps. RPA breaks the moment a button moves, a field is renamed, or a popup interrupts the flow. An agent adapts to those changes because it understands what it is trying to accomplish, not which pixels to click. RPA is a tactical workaround for legacy software with no integration path; AI agents are how you build workflows where judgment is part of the work. Many real systems use both: an agent decides what should happen, then triggers RPA to actually click through a legacy vendor portal where no API exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can AI agents replace Zapier or Make.com?
&lt;/h3&gt;

&lt;p&gt;For most use cases, the answer is no, and that is the right answer. Zapier and Make.com are designed for what they do well: connecting two SaaS APIs through a fixed trigger-action sequence. If your workflow is 'when a Typeform submission arrives, create a Salesforce lead', a Zap is faster to build, easier to maintain, and cheaper to run than an agent. Agents become the better choice when the workflow involves reading unstructured input (a paragraph of customer text rather than a clean form field), making a judgment call (which of these five categories does this fit?), or recovering from a partial failure (the lead exists but the email field is malformed). The pattern most teams settle into: Zapier or Make for the simple connector work, an AI agent for the workflows where the rules used to live in a human's head.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between an AI agent and a chatbot?
&lt;/h3&gt;

&lt;p&gt;A chatbot answers questions in a conversation thread; an AI agent takes actions on systems. The simplest test is asking what happens when the human stops typing. A chatbot waits for the next message. An agent goes off and does the work — checking three systems, drafting a reply, routing it to a queue, escalating to a human only when its confidence is below threshold — and reports back what it did. Chatbots are mostly a thin interface layer on top of an LLM with retrieval. Agents are an interface layer plus tool access plus memory plus a control loop plus an evaluation harness. The skills overlap enough that many production agents include a chat interface as one of the surfaces a user can interact with — but the work that defines an agent happens outside the chat, while the work that defines a chatbot happens inside it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do AI agents need to use LLMs?
&lt;/h3&gt;

&lt;p&gt;In practice, yes — and that is what makes them different from the classical AI agents in academic literature. The decision-making layer that lets an agent reason about novel situations is supplied by a large language model: GPT-class, Claude-class, or an open-source equivalent. Without that reasoning capability, what you have is an automation system or a rules engine. That said, an agent rarely uses only an LLM. Production agents use the LLM for reasoning, dedicated retrieval systems for relevant context, structured tools (API calls, database queries, function executions) for actions, and often smaller classification models for cheap intent routing before the expensive reasoning step. The LLM is the brain, but the brain is wired to a body of tools and memory. Builds that pretend an LLM alone is an agent skip the parts that make agents survive in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I migrate from traditional automation to an AI agent later?
&lt;/h3&gt;

&lt;p&gt;Yes, and the cleanest migration path is usually incremental rather than rip-and-replace. Most teams that successfully migrate start by leaving the automation in place, then add an agent in front of the workflow to handle the cases the automation could not — exception routing, unstructured inputs, judgment calls. As confidence in the agent grows, more of the workflow shifts to the agent and the original automation either becomes a deterministic tool the agent calls into for specific actions, or it gets retired. Trying to migrate everything at once tends to fail because you lose the operational stability of a workflow that already works while you debug an unproven system. The intermediate state where automation handles the predictable 80 percent and the agent handles the judgment-heavy 20 percent is often the right long-term architecture, not a transitional one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shortest possible summary
&lt;/h2&gt;

&lt;p&gt;Automation runs your business rules at scale. Agents make decisions where the rule is not fixed yet. Pick the right tool for the workflow, not the cool tool for the demo. The five-question framework gets you most of the way to the right answer for any specific workflow, and the cases where it does not are the ones worth a real conversation. Most workflows in most businesses are still automation problems and always will be — that is fine, automation is undervalued. The smaller set that actually needs judgment is where AI agents earn their keep, and getting that distinction right is the single most valuable scoping move you can make before spending engineering budget on the wrong tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading + go deeper&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;AI Agent Development service overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-strategy-consulting" rel="noopener noreferrer"&gt;AI Strategy &amp;amp; Architecture engagement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What drives the cost of building an AI agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/why-most-ai-projects-fail" rel="noopener noreferrer"&gt;Why most AI projects fail (and it is not the models)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic on building effective agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/agents" rel="noopener noreferrer"&gt;OpenAI platform docs on agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/ai-agents-vs-automation" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/ai-agents-vs-automation&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What Drives the Cost of Building an AI Agent? A 2026 Honest Breakdown</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/what-drives-the-cost-of-building-an-ai-agent-a-2026-honest-breakdown-1n2k</link>
      <guid>https://dev.to/softwarebuilding/what-drives-the-cost-of-building-an-ai-agent-a-2026-honest-breakdown-1n2k</guid>
      <description>&lt;p&gt;Almost every cost guide on this topic was written to rank for the keyword, not to help you scope a real build. We are not going to pretend we can quote your project off a generic list of brackets, because no honest agency can. What we can do — and what is actually useful — is explain the variables that move the number more than anything else, where the engineering time goes once a build kicks off, and how to scope so you do not overspend on the wrong things.&lt;/p&gt;

&lt;p&gt;We are a US-based agency. We have shipped agents that handle real customer support, real document processing, and real internal operations for mid-market companies. The cost of those builds has moved by 3x across projects that looked superficially similar on paper. The reasons it moves are always the same four variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four variables that move the price
&lt;/h2&gt;

&lt;p&gt;When two agencies quote the same project at very different prices, four variables almost always explain the gap. None of them are about the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data readiness
&lt;/h3&gt;

&lt;p&gt;If your data is in one place, well-structured, and someone owns it, your agent build is cheap. If your data is in fifteen tools, six spreadsheets, and four people's heads, your agent build now silently includes a data engineering project. That data work is real and unglamorous, and most failed AI projects we audit failed here — not at the model layer. Cleaning up the data before the build often pays for itself many times over during the build.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Integration count
&lt;/h3&gt;

&lt;p&gt;Each system the agent has to read from or write to is its own small build. A clean SaaS API with good docs is a few days of work. An on-prem system with no public docs is two to four weeks of reverse engineering. Some clients have us integrate with five tools and the integration work ends up being well over half the build. Integration count is the single best predictor of timeline and cost once data is in shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The cost of being wrong
&lt;/h3&gt;

&lt;p&gt;An agent that drafts marketing copy is fine to ship at 90 percent accuracy. An agent that approves wire transfers is not. The higher the cost of a wrong action, the more we have to invest in evaluation, structured output validation, human-review checkpoints, retry logic, and rollback paths. That is engineering work, not prompting, and on high-stakes workflows it can roughly double the build relative to a low-stakes equivalent. The accuracy bar your workflow tolerates is one of the biggest cost levers — and one most buyers do not realize they are choosing.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Whether you want to own it
&lt;/h3&gt;

&lt;p&gt;Some clients want the build, the docs, and a clean handoff to their own engineering team. Some want us to keep running it on retainer. The build itself costs roughly the same; the shape of the after-cost diverges sharply. Handoff engagements include extra documentation, runbooks, and knowledge transfer baked into the build. Retainer engagements skip that and stay leaner up front, but you carry a recurring line item afterward. Neither model is universally cheaper — they reflect different operating preferences, not different quality tiers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the money actually goes
&lt;/h2&gt;

&lt;p&gt;The thing most cost guides hide is that the LLM bill is almost never the expensive part. We have shipped agents whose monthly inference cost is genuinely small. The build cost is engineering time, and engineering time goes here, in roughly this order.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Discovery and scoping (10 to 20 percent). Mapping the workflow, defining success metrics, deciding what the agent should NOT do. Skipping this is how projects die in month three.&lt;/li&gt;
&lt;li&gt;Data and integrations (20 to 40 percent). The unglamorous middle. Reading from your CRM, writing to your ERP, parsing the PDFs, getting the access tokens. Where junior teams burn the most time.&lt;/li&gt;
&lt;li&gt;Agent architecture and prompting (15 to 25 percent). Control flow, tool definitions, prompt design, structured output schemas. The visible AI work, but rarely the largest line item.&lt;/li&gt;
&lt;li&gt;Evaluation and hardening (15 to 25 percent). Building the test set, the eval harness, the retry logic, the observability. This is where most "demo agents" never get; production agents need it.&lt;/li&gt;
&lt;li&gt;Deployment, monitoring, and handoff (5 to 15 percent). CI/CD, alerting, runbooks. Cheap if you have engineers. Expensive if you do not.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Three things worth memorizing
&lt;/h2&gt;

&lt;p&gt;If you remember nothing else from this post, remember these. They have held up across dozens of projects we have scoped.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is a real floor on a production agent built by a US team that knows what they are doing. Below that floor, you are paying for a prototype, not a system. Treat anything quoted dramatically below market as a research project, not a deliverable.&lt;/li&gt;
&lt;li&gt;A focused first agent typically ships in weeks, not months. Anything quoted at days is almost always going to ship as a fragile demo and then need a rebuild later. Multi-step or multi-agent systems run a few months because integration count is the bottleneck.&lt;/li&gt;
&lt;li&gt;6 to 12 months is a realistic payback window if the agent replaces or accelerates a real role. If your math says payback in 2 months, you are probably overcounting; if it says 24 months, the use case is probably wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to avoid overspending
&lt;/h2&gt;

&lt;p&gt;The cheapest agent project is the one you scope honestly. Three habits keep that scope honest.&lt;/p&gt;

&lt;p&gt;First, separate strategy from build. A 1- to 2-week strategy sprint that produces a written architecture and a scoped build proposal almost always pays for itself by killing the wrong project before it gets to engineering. We have killed three of our own pitched projects this way and the clients thanked us.&lt;/p&gt;

&lt;p&gt;Second, ship a focused single-workflow agent first. Multi-agent platforms are interesting but expensive and brittle. A single-workflow agent in production beats a multi-agent system in a slide deck every time. Once one agent works in your environment, the second one is cheaper because the surrounding infrastructure already exists.&lt;/p&gt;

&lt;p&gt;Third, demand weekly demos on real data. If your agency runs a 6-week project with no demo until week 5, the budget could be on track or wildly off and you would not know which until it was too late. Weekly demos make scope creep visible while you can still fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shortest possible answer
&lt;/h2&gt;

&lt;p&gt;The cost of an AI agent build moves with four things: data readiness, integration count, the cost of being wrong, and whether you want a clean handoff or an ongoing retainer. The LLM bill is rarely the issue. Engineering time is. Anyone quoting your specific project off a generic bracket without understanding those four variables is either guessing or pricing for the average client, not for you.&lt;/p&gt;

&lt;p&gt;If you want a real number for your specific use case, the fastest path is a free 30-minute strategy call. We scope tightly enough on the call to give you a written proposal within a week — built around your real workflow, not a generic range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading + deeper scope&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;AI Agent Development service overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-strategy-consulting" rel="noopener noreferrer"&gt;AI Strategy &amp;amp; Architecture engagement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agency-vs-in-house" rel="noopener noreferrer"&gt;AI Agency vs In-House: which makes sense for you&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/how-to-scope-an-ai-agent-project" rel="noopener noreferrer"&gt;How to scope an AI agent project before pricing it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/pricing" rel="noopener noreferrer"&gt;Anthropic API pricing reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>business</category>
      <category>startup</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Choose an AI Development Agency: 12 Questions That Separate Real Builders from Hype Shops</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/how-to-choose-an-ai-development-agency-12-questions-that-separate-real-builders-from-hype-shops-1416</link>
      <guid>https://dev.to/softwarebuilding/how-to-choose-an-ai-development-agency-12-questions-that-separate-real-builders-from-hype-shops-1416</guid>
      <description>&lt;p&gt;Vetting an AI agency is harder than it should be. The category is two years old. Most of the people selling AI services last year were selling something else the year before. Demos are easy to fake. Pitch decks are easier. The signal is mostly in the questions you ask and the texture of the answers you get back.&lt;/p&gt;

&lt;p&gt;We are an AI agency. We have been on both sides of these calls. Below are the twelve questions that, in our experience, do the most to separate teams that have shipped production AI from teams that have shipped a demo and a deck. We have included what a strong answer sounds like and what should make you walk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before you ask anything: decide if you should hire an agency at all
&lt;/h2&gt;

&lt;p&gt;If you have a senior AI engineer in-house with capacity, you do not need an agency for most projects. The math changes around build speed, breadth of tooling experience, and whether you can hire that engineer this quarter. We wrote a separate post on the agency-vs-in-house decision; read that first if you have not made it yet. The questions below assume you have decided agency is the right call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 12 questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Show me a production AI agent you shipped that is still running.
&lt;/h3&gt;

&lt;p&gt;Strong answer: a screen-share of the live system, written by their engineers, with at least 6 months in production. They will tell you what broke in the first month. If they hedge — "under NDA," "in private repos," "in a research environment" — they probably do not have one. "Demo videos" are not production.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. What was the failure mode of your last project, and what did you learn?
&lt;/h3&gt;

&lt;p&gt;Every team that has shipped AI has had something fail in production. Hallucination on an edge case, a runaway tool call, a permission leak, a model deprecation that broke an integration. If they cannot name a specific failure, they have not been there long enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Walk me through how you handle hallucinations on a high-stakes workflow.
&lt;/h3&gt;

&lt;p&gt;Strong answer mentions at least three of: structured output validation with schema enforcement, an evaluation harness that runs on a held-out test set every deploy, human-in-the-loop checkpoints on high-cost actions, retry/fallback logic, and explicit denial paths (the agent saying "I do not know" instead of guessing). Weak answer is "better prompts" or "we use GPT-5."&lt;/p&gt;

&lt;h3&gt;
  
  
  4. What is your evaluation harness? Show me a recent eval run.
&lt;/h3&gt;

&lt;p&gt;Production AI teams have a test set the agent runs against on every deploy. Regressions block merges. If they do not have one, every release is roulette. This is the single best filter for engineering maturity in 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. How will the system handle the eventual model migration?
&lt;/h3&gt;

&lt;p&gt;Models get deprecated. Pricing changes. New models behave differently. Strong answer: "Our agent code is decoupled from the specific model. We have a thin provider layer; swapping from Claude 4.6 to 4.7 is a config change plus an eval run." Weak answer: "We have not had to do that yet."&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Who owns the code, the data, and the model accounts?
&lt;/h3&gt;

&lt;p&gt;The right answer is: you do. Code in your GitHub. Data in your accounts. LLM API keys in your name. If they keep the keys, the data lives in their infra, or the code is in a private repo they control, you are renting your AI from them. That is sometimes fine, but you should be choosing it intentionally, not having it sneak in.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. What does the handoff look like if we want to take this in-house in 12 months?
&lt;/h3&gt;

&lt;p&gt;Even if you intend to keep the agency forever, ask. The answer reveals the shape of the work. Strong: "Documented architecture, runbooks, eval harness, and a 2-week handoff sprint to your new engineer." Weak: "That has never come up" or "Why would you want to do that?"&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Who specifically will be doing the engineering on my project?
&lt;/h3&gt;

&lt;p&gt;Get names. Get GitHub profiles. Get LinkedIn profiles. Some agencies sell senior talent in the demo and assign juniors to the build. The technical lead on the demo should be the technical lead on your project, or there should be an explicit, named handoff to a peer-level engineer.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. What is your pricing model? Why?
&lt;/h3&gt;

&lt;p&gt;Hourly is fine for discovery and unknowns. Fixed-price is fine after a discovery sprint produces a tight scope. Time-and-materials with no discovery first is a red flag — it means they have not bothered to scope. The why matters more than the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. What is the fastest you can ship a working v1?
&lt;/h3&gt;

&lt;p&gt;Strong answer is concrete and grounded: "Three weeks for a single-workflow agent if your data is in good shape; six to eight if we have to clean it up first." Weak answer is generic: "It depends, every project is different." Of course it depends. The question is whether they have shipped enough to estimate.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. What part of this would you tell me NOT to build?
&lt;/h3&gt;

&lt;p&gt;Best signal of all. A team that has shipped real AI will know which parts of your idea are bad ideas. They will tell you. A team that says "we can build all of it" is selling a deliverable, not a system. The strongest agencies we know have killed at least one of their own pitched projects by talking the client out of it.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. If we are unhappy in week three, what happens?
&lt;/h3&gt;

&lt;p&gt;Strong answer: "We have an off-ramp clause — you can stop the project at the end of any sprint. We hand over what we have built and the prepaid balance is credited or refunded." Weak answer: a contract you cannot exit without paying for unfinished work. The off-ramp answer alone tells you a lot about the kind of team you are dealing with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hidden red flags inside good-looking answers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;They have an "AI Center of Excellence" but cannot name the specific engineer who would lead your build. Often a reseller of someone else's work.&lt;/li&gt;
&lt;li&gt;Every answer references LangChain by name regardless of the question. The framework is fine; the over-reliance is not.&lt;/li&gt;
&lt;li&gt;"We are partners with [OpenAI / Anthropic / AWS]." Almost everyone is. Tells you nothing about engineering.&lt;/li&gt;
&lt;li&gt;They cannot explain why they would NOT use a vector database for your use case. If everything is RAG, they have a hammer.&lt;/li&gt;
&lt;li&gt;No public engineering writing. Not a deal-breaker, but the strongest teams almost always have something — a blog, a GitHub, a conference talk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What strong answers actually sound like
&lt;/h2&gt;

&lt;p&gt;We did not write this post to make you ask us those questions. We wrote it because we have watched too many AI projects start with the wrong agency and stall by month four. The right agency for you might be us. It might not. The questions above will help you tell.&lt;/p&gt;

&lt;p&gt;If you want to pressure-test these questions on a live call before you take them to other agencies, book a free strategy call. We will answer them on the record and tell you which ones you should weight most heavily for your specific project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Talk to us, or read more&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/ai-agency-vs-in-house" rel="noopener noreferrer"&gt;AI Agency vs In-House: how to decide before you hire either&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What an AI agent build actually costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-strategy-consulting" rel="noopener noreferrer"&gt;Our AI strategy consulting engagement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Anthropic — Building Effective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph official documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/how-to-choose-an-ai-development-agency" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/how-to-choose-an-ai-development-agency&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>business</category>
      <category>startup</category>
      <category>productivity</category>
    </item>
    <item>
      <title>LangChain vs CrewAI vs AutoGen: What Business Buyers Need to Know Before Hiring an AI Agency</title>
      <dc:creator>Anton Resnick</dc:creator>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
      <link>https://dev.to/softwarebuilding/langchain-vs-crewai-vs-autogen-what-business-buyers-need-to-know-before-hiring-an-ai-agency-4h7i</link>
      <guid>https://dev.to/softwarebuilding/langchain-vs-crewai-vs-autogen-what-business-buyers-need-to-know-before-hiring-an-ai-agency-4h7i</guid>
      <description>&lt;p&gt;If you are a non-technical buyer hiring an AI agency, the framework conversation feels like it is in a different language. The agency mentions LangGraph or CrewAI; you nod; you go look it up; the comparison posts are full of decorators, message buses, and graph-based orchestration patterns. None of them tell you what you actually need to know — which framework choice means a faster build, a lower bill, less lock-in, or a deeper talent pool when it is time to extend the system.&lt;/p&gt;

&lt;p&gt;This post is the version we wish more buyers had read before signing with the wrong agency. We will not turn you into an AI engineer. We will give you enough framework literacy to ask the right questions and recognize good answers when you hear them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-second context: why this matters at all
&lt;/h2&gt;

&lt;p&gt;AI agent frameworks are the scaffolding agencies use to build agents on top of LLMs. The framework choice affects build speed, monthly running cost, how easy the system is for someone else to take over, and how exposed you are if a vendor changes course. A bad framework choice can mean a 30 percent budget overrun and a system no one else wants to inherit.&lt;/p&gt;

&lt;p&gt;Three names dominate the conversation in 2026: LangChain (and its newer sibling LangGraph), CrewAI, and AutoGen. The state of each has shifted significantly in the last twelve months — anything written in 2024 about these frameworks is at least partially out of date.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain and LangGraph (the safe-default ecosystem)
&lt;/h2&gt;

&lt;p&gt;LangChain is the framework most engineers learned AI development on. LangGraph is its newer, lower-level orchestration layer designed specifically for agent workflows; it hit 1.0 in 2025 and has become the production choice in the LangChain ecosystem.&lt;/p&gt;

&lt;p&gt;What you should know as a buyer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Largest talent pool. If you ever need to swap teams, find a contractor, or hire in-house, more engineers know LangChain/LangGraph than the other two combined.&lt;/li&gt;
&lt;li&gt;Most extensions. Pre-built integrations with hundreds of databases, model providers, and tools. For build speed on common patterns, hard to beat.&lt;/li&gt;
&lt;li&gt;Heaviest dependency footprint. Comes with a reputation for breaking changes and version churn. Production teams typically use LangGraph rather than the full LangChain framework to limit exposure.&lt;/li&gt;
&lt;li&gt;Vendor adjacency: LangChain Inc sells LangSmith (observability) and LangGraph Cloud (hosting). The free framework is real, but commercial pressure exists. Ask whether your agency uses the paid SaaS and what it would cost you per month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CrewAI (the role-based newcomer)
&lt;/h2&gt;

&lt;p&gt;CrewAI hit production maturity in 2025. Its core idea is multi-agent collaboration with explicit roles — a "researcher" agent, an "editor" agent, a "reviewer" agent — coordinated like a team. The mental model maps well to non-technical descriptions of work, which is why it shows up in agency pitches.&lt;/p&gt;

&lt;p&gt;What you should know as a buyer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intuitive abstractions. The role/task model is the easiest to understand if your agency is walking you through architecture diagrams. That is genuinely useful when you have to sign off on the design.&lt;/li&gt;
&lt;li&gt;Smaller talent pool than LangChain. Engineers who have shipped production CrewAI exist, but the pool is meaningfully thinner. Plan for a slightly smaller hiring market if you want to take it in-house.&lt;/li&gt;
&lt;li&gt;Newer, faster-moving. APIs are still evolving. Anything built on CrewAI today may need updates within 12 months that purely LangGraph builds would not.&lt;/li&gt;
&lt;li&gt;Multi-agent by default. If your use case is a single-workflow agent, CrewAI is overkill — like buying a fleet management system to track one car. Ask the agency why they picked it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AutoGen (the framework you should think twice about)
&lt;/h2&gt;

&lt;p&gt;AutoGen, originally a Microsoft Research project, was the most-discussed multi-agent framework of 2024. In 2025 the project shifted into maintenance mode as the team moved to Microsoft's newer agent platform offerings. It still works. The momentum is gone.&lt;/p&gt;

&lt;p&gt;What you should know as a buyer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active production use exists. Code does not stop working when momentum shifts.&lt;/li&gt;
&lt;li&gt;Future-proofing risk is real. If the framework gets fewer commits and fewer updates over the next 18 months, you may pay for a migration in 2027. Ask the agency how they would handle that.&lt;/li&gt;
&lt;li&gt;Microsoft-leaning ecosystem. If your stack is heavily on Azure and you want first-class Microsoft integration, the AutoGen lineage may still be the best fit. Outside the Microsoft ecosystem, the case is weaker.&lt;/li&gt;
&lt;li&gt;If an agency in 2026 pitches AutoGen as their default for a new project, ask why they did not choose LangGraph or CrewAI. The answer will tell you something about how current their thinking is.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When "none of the above" is the right answer
&lt;/h2&gt;

&lt;p&gt;An honest agency will sometimes tell you the right framework for your project is no framework. For genuinely simple agents — a single LLM call with a couple of tools — pure code (TypeScript, Python, the model SDK directly) is faster to ship, easier to debug, and immune to framework breaking changes. We have shipped production agents in 200 lines of code with no framework. Those agents will outlive most of the framework choices on this page.&lt;/p&gt;

&lt;p&gt;Conversely, for genuinely complex multi-agent platforms with shared memory, structured tool access, and observability, a framework is almost always correct because the alternative is reinventing the framework yourself. The judgment call is which side of that line your project sits on. A team that has shipped both will know.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to ask your agency about framework choice
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;"Which framework would you use for this project, and why this one over the other two?" The answer should reference your specific use case, not their general preference.&lt;/li&gt;
&lt;li&gt;"What would have to change about my project for you to recommend a different framework?" If they cannot answer, they have one hammer.&lt;/li&gt;
&lt;li&gt;"What is the migration cost if we needed to swap frameworks in 18 months?" Strong agencies design for this. Weak ones tell you it will not happen.&lt;/li&gt;
&lt;li&gt;"Are we using any paid SaaS tied to this framework?" Watch for LangSmith, hosted LangGraph, or other monthly bills you would inherit.&lt;/li&gt;
&lt;li&gt;"How easy will it be to hire someone to take this over from you in 12 months?" If they say "very hard," they are honest. If they say "very easy" without qualification, they are oversimplifying.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How we choose, for what it is worth
&lt;/h2&gt;

&lt;p&gt;Our default in 2026 is LangGraph for any agent that needs more than a couple of tool calls or any state across turns, and pure-code Python or TypeScript for anything simpler. We use CrewAI when the client benefits from the role-based mental model in design conversations and the multi-agent shape genuinely fits the work. We pick AutoGen rarely, mostly for clients with deep Azure integration needs.&lt;/p&gt;

&lt;p&gt;Almost more important than the framework choice is the layer of decoupling we always build between agent code and the framework — so that swapping out CrewAI for LangGraph two years from now is a manageable refactor, not a rewrite. That decoupling pattern is itself a question worth asking on any vetting call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The summary you can take to your next agency call
&lt;/h2&gt;

&lt;p&gt;LangChain/LangGraph is the safe default and the easiest to staff. CrewAI is the right call for genuinely multi-agent workflows where role-based design helps the conversation. AutoGen has lost momentum; pick it intentionally or not at all. None-of-the-above is the right answer more often than agencies admit. The agency you hire should be able to explain their choice in plain language and walk through what would change their mind.&lt;/p&gt;

&lt;p&gt;If you want to pressure-test a specific agency's framework recommendation before signing, the fastest way is a 30-minute strategy call with a second opinion. We will tell you whether the choice fits your project and what we would do differently — even if the answer is "the agency you are talking to is correct, hire them."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep going&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/contact" rel="noopener noreferrer"&gt;Book a free AI strategy call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/how-to-choose-an-ai-development-agency" rel="noopener noreferrer"&gt;12 questions to ask any AI agency before hiring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/cost-to-build-an-ai-agent" rel="noopener noreferrer"&gt;What an AI agent build actually costs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/blog/why-most-ai-projects-fail" rel="noopener noreferrer"&gt;Why most AI projects fail (and it is not the models)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://softwarebuilding.ai/services/ai-agent-development" rel="noopener noreferrer"&gt;Our AI agent development service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph official documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.crewai.com" rel="noopener noreferrer"&gt;CrewAI documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://microsoft.github.io/autogen/" rel="noopener noreferrer"&gt;Microsoft AutoGen&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers" rel="noopener noreferrer"&gt;https://softwarebuilding.ai/blog/langchain-vs-crewai-vs-autogen-for-buyers&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>llm</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
