DEV Community

vinz
vinz

Posted on

The state of AI agents in March 2026, and how to build a topic-specific one

The state of AI agents in March 2026, and how to build a topic-specific one

A year ago, a lot of "agent" talk was just prompt theater wearing a trench coat.

A loop called a model, maybe hit one tool, maybe dumped some text into memory, and people called it autonomous. The demos were shiny. The reliability was not.

By March 2026, the interesting change is not that models suddenly became magical. The interesting change is that the surrounding infrastructure matured enough that agents are now useful in narrow, well-instrumented slices of real work.

That distinction matters.

An agent is not just "an LLM with a task." In practice, an agent is a system that can:

  • decide when to use tools
  • operate in a loop
  • retrieve context from external systems
  • keep state across steps
  • hand work to specialized components
  • expose traces so humans can inspect what happened
  • stay inside safety and policy boundaries

That is a very different beast from a chatbot with a longer prompt.

In this article, I want to do two things:

  1. Give a clean snapshot of how the agent landscape actually changed by March 2026.
  2. Show a practical tutorial for building a topic-specific agent instead of a vague "general AI employee" fantasy machine.

The big shift: from prompt wrappers to systems

The early agent wave mostly failed in predictable ways:

  • too much autonomy, not enough verification
  • too many tools, poorly described
  • brittle long-context behavior
  • no observability
  • no clear domain boundaries
  • no evals, only vibes

That produced agents that looked clever in demos and fell apart under repetition, ambiguity, or adversarial input.

The current generation is more grounded. The best teams now treat agents as software systems with probabilistic components, not as mystical employees in the cloud.

That shift shows up in five concrete changes.

1. Tools became first-class, not bolted on

A major shift in 2025 and early 2026 was the standardization of tool use.

Instead of building every agent around custom glue code, platforms started exposing built-in and structured tool interfaces for things like:

  • web search
  • file retrieval
  • code execution
  • browser or computer interaction
  • external APIs
  • remote tool servers

This matters because raw model intelligence is rarely enough. Useful work usually depends on external state.

Without tools, the model hallucinates.
With tools, it can at least fail against reality.

That does not make it automatically correct. It just means the system now has a way to check reality instead of freehanding nonsense like a sleep-deprived intern.

2. Agent frameworks got more opinionated

By March 2026, the ecosystem is much less "just write a while loop and pray."

The winning direction is not maximum flexibility. It is constrained orchestration:

  • explicit handoffs between specialized agents
  • typed tool interfaces
  • tracing and replay
  • guardrails and policy checks
  • state management
  • evaluation hooks

This is healthy.

The field had to learn the same lesson distributed systems learned long ago: once a workflow spans multiple steps, hidden state and silent failure become the real monster under the bed.

3. Protocols matter now

One of the most important structural changes is the rise of shared protocols for tool and context access, especially MCP, the Model Context Protocol.

That sounds boring. It is not. Boring infrastructure is where ecosystems become real.

A standard protocol means agents do not need bespoke integration logic for every tool source. It also means tool ecosystems can compound instead of fragmenting into provider-specific fiefdoms.

In plain English: the future is less "one giant assistant that owns everything" and more "many tools and data sources connected through common interfaces."

4. The best agents are vertical, not universal

This is the most useful practical lesson.

General-purpose agents remain fragile. Topic-specific agents are where the real value is.

Why?

Because narrow scope lets you control:

  • the tool set
  • the retrieval corpus
  • the failure modes
  • the success criteria
  • the review process
  • the output schema

That drastically improves reliability.

A research agent for accessibility guidance, a support triage agent for a known product surface, or a CI assistant for one codebase can be genuinely useful.

A fully autonomous do-anything agent is still mostly a very expensive way to generate surprise.

5. Observability and evals are finally part of the conversation

This is the least glamorous change and probably the most important.

In 2024, people asked, "Can the agent do the task?"

In 2026, the sharper question is, "Under which conditions does it fail, how often, and can we detect the failure before it hurts something?"

That is a better question because it treats the agent as an engineering system.

Serious teams now care about:

  • traces
  • tool call logs
  • refusal behavior
  • hallucination rates
  • routing accuracy
  • retry policy
  • cost per successful task
  • human escalation thresholds

That is how the field grows up.

What changed across the major ecosystems

Here is the short version, stripped of marketing perfume.

OpenAI

OpenAI pushed the ecosystem toward a more unified agent stack around the Responses API, built-in tools, the Agents SDK, and support for remote MCP-style tool access. The main pattern is clear: one API surface for multi-step, tool-using applications, plus orchestration primitives for handoffs, tracing, and stateful workflows.

Anthropic

Anthropic stayed very influential in the practical design philosophy around agents. Their materials strongly emphasize the distinction between workflows and agents, and they have continued investing in computer use, context engineering, long-running agent harnesses, and MCP-related tooling. That has shaped how many teams think about reliability.

Google

Google pushed heavily on research-style and multimodal agent workflows, including Deep Research and agent-oriented interfaces in the Gemini ecosystem. Their direction has been especially strong in search-heavy, synthesis-heavy, multi-step work.

Microsoft

Microsoft consolidated its story by positioning Microsoft Agent Framework as the successor direction that combines ideas from AutoGen and Semantic Kernel. That is a sign of ecosystem convergence: experiments are giving way to more production-oriented frameworks.

What agents are still bad at

March 2026 is not the dawn of artificial coworkers replacing half your org chart before lunch.

Agents are still weak or unreliable at:

  • open-ended tasks with fuzzy success criteria
  • long chains of action without verification
  • high-risk workflows involving money, privacy, or irreversible actions
  • ambiguous environments with poor tool descriptions
  • tasks that require hidden business context not present in retrieval or tools

The deepest recurring problem is simple:

agents amplify ambiguity.

If your task definition is sloppy, your tool design is vague, your retrieval corpus is noisy, or your success criteria are mush, the agent does not rescue the system. It magnifies the mess.

So the modern design rule is not "make the model smarter."
It is "make the problem legible."


Tutorial: build a topic-specific frontend accessibility research agent

Let us build something real and bounded.

Not a fake AGI office worker.
Not a twenty-agent cathedral of confusion.

We will build a frontend accessibility research agent that can:

  • answer questions about a specific accessibility topic
  • search the web for current guidance
  • retrieve from your internal notes or docs
  • return structured output with sources, recommendations, and caveats

This is useful because accessibility guidance changes, browser support changes, framework behavior changes, and internal design system constraints matter.

A generic assistant will often blur those layers together. A topic-specific agent gives you tighter control.

What we are building

Our agent will focus on one domain:

Accessible form validation for web apps

That means it should reason within a constrained surface:

  • labels and descriptions
  • error messaging
  • ARIA usage
  • keyboard flow
  • focus management
  • screen reader announcements
  • browser and framework caveats

The agent should not pretend to know everything about all accessibility topics. That restraint is a feature, not a bug.

Architecture

We will use a simple architecture:

  1. A single specialist agent with a narrow system prompt.
  2. Web search for current public guidance.
  3. File search for your internal standards or design system docs.
  4. A strict output schema.
  5. Human review before any change is shipped.

That is already enough to be useful.

Why this works better than a general agent

Because we are constraining all the important dimensions:

  • domain: accessibility for forms
  • sources: current web references plus your internal docs
  • format: structured answer
  • tooling: only the tools needed for research
  • action space: analysis and recommendation, not autonomous deployment

That dramatically reduces chaos.

Step 1: install dependencies

npm install openai zod dotenv
Enter fullscreen mode Exit fullscreen mode

We are using JavaScript here because dev.to and frontend people tend to enjoy staying in one runtime instead of spawning seven languages for sport.

Step 2: set up environment variables

Create a .env file:

OPENAI_API_KEY=your_api_key_here
Enter fullscreen mode Exit fullscreen mode

Step 3: define the agent contract

Create accessibility-agent.js:

import OpenAI from "openai";
import { z } from "zod";
import "dotenv/config";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const AccessibilityResponseSchema = z.object({
  topic: z.string(),
  summary: z.string(),
  recommendations: z.array(
    z.object({
      title: z.string(),
      rationale: z.string(),
      priority: z.enum(["high", "medium", "low"]),
    })
  ),
  risks: z.array(z.string()),
  open_questions: z.array(z.string()),
  sources: z.array(
    z.object({
      title: z.string(),
      url: z.string(),
      source_type: z.enum(["web", "internal"]),
    })
  ),
});

const SYSTEM_PROMPT = `
You are a topic-specific frontend accessibility research agent.

Scope:
- Only answer questions about accessible form validation in web applications.
- Prefer current standards and implementation guidance.
- Use tools when needed instead of guessing.
- Separate standards, implementation advice, and assumptions.
- If evidence is weak or conflicting, say so explicitly.

Output rules:
- Return concise, structured analysis.
- Include actionable recommendations.
- Include risks and unresolved questions.
- Cite the sources you relied on.
- Do not invent standards, browser support, or assistive technology behavior.
`;

async function run(question) {
  const response = await client.responses.create({
    model: "gpt-5.4",
    input: [
      {
        role: "system",
        content: SYSTEM_PROMPT,
      },
      {
        role: "user",
        content: `Question: ${question}`,
      },
    ],
    tools: [
      { type: "web_search" },
      {
        type: "file_search",
        vector_store_ids: ["YOUR_VECTOR_STORE_ID"],
      },
    ],
    text: {
      format: {
        type: "json_schema",
        name: "accessibility_research_result",
        schema: {
          type: "object",
          properties: {
            topic: { type: "string" },
            summary: { type: "string" },
            recommendations: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  title: { type: "string" },
                  rationale: { type: "string" },
                  priority: {
                    type: "string",
                    enum: ["high", "medium", "low"],
                  },
                },
                required: ["title", "rationale", "priority"],
                additionalProperties: false,
              },
            },
            risks: {
              type: "array",
              items: { type: "string" },
            },
            open_questions: {
              type: "array",
              items: { type: "string" },
            },
            sources: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  title: { type: "string" },
                  url: { type: "string" },
                  source_type: {
                    type: "string",
                    enum: ["web", "internal"],
                  },
                },
                required: ["title", "url", "source_type"],
                additionalProperties: false,
              },
            },
          },
          required: [
            "topic",
            "summary",
            "recommendations",
            "risks",
            "open_questions",
            "sources",
          ],
          additionalProperties: false,
        },
      },
    },
  });

  const parsed = JSON.parse(response.output_text);
  const validated = AccessibilityResponseSchema.parse(parsed);

  console.dir(validated, { depth: null });
}

run(
  "What is the correct pattern for showing inline form errors accessibly in a React checkout flow, including aria-invalid, aria-describedby, focus handling, and live region usage?"
).catch((error) => {
  console.error(error);
  process.exit(1);
});
Enter fullscreen mode Exit fullscreen mode

Step 4: add your internal docs

If you have internal accessibility notes, design system guidelines, QA checklists, or previous audit findings, put them in a vector store and connect that store to file_search.

The goal is not just to know public best practice.
The goal is to know your constraints.

For example, your internal docs might say:

  • your design system always renders helper text below fields
  • your error summary component already exists
  • your mobile checkout flow cannot steal focus aggressively
  • a specific screen reader bug has already been documented internally

That kind of context is where topic-specific agents become actually useful.

Step 5: keep the output narrow and inspectable

Do not let the agent free-write essays forever.

Force an answer structure like this:

  • summary
  • recommendations
  • risks
  • open questions
  • sources

That gives you three benefits:

  1. Easier downstream rendering.
  2. Easier human review.
  3. Easier evals.

Free-form text feels smart. Structured text is easier to trust.

Step 6: test with adversarial prompts

Now test questions like:

  • "Should I use aria-live on every field error?"
  • "Can placeholder text replace labels if the form is simple?"
  • "Should focus always jump to the first invalid field?"
  • "Is aria-invalid enough on its own?"

These are useful because they expose overgeneralization.

A bad agent will answer with fake certainty.
A better one will distinguish:

  • what is required by standard guidance
  • what is implementation-dependent
  • what depends on the UX flow
  • what still needs manual validation with assistive tech

Step 7: add a lightweight evaluator

Even a tiny evaluator helps.

For example, create a checklist that scores whether the answer:

  • cited at least two sources
  • included at least one risk
  • separated evidence from assumption
  • stayed inside topic scope
  • avoided recommending placeholder-only labeling

Pseudo-code:

function evaluateAnswer(answer) {
  const failures = [];

  if (answer.sources.length < 2) {
    failures.push("Too few sources");
  }

  if (answer.risks.length === 0) {
    failures.push("No risks listed");
  }

  const textBlob = JSON.stringify(answer).toLowerCase();
  if (textBlob.includes("placeholder can replace label")) {
    failures.push("Unsafe labeling advice");
  }

  return {
    passed: failures.length === 0,
    failures,
  };
}
Enter fullscreen mode Exit fullscreen mode

This is not glamorous. It is also how you stop your agent from becoming a chaos generator in a nice jacket.

Step 8: know when not to automate

This agent should not automatically:

  • patch production code
  • approve accessibility compliance
  • file legal conformance claims
  • override manual QA
  • claim screen reader compatibility without testing

Research support is a good fit.
Compliance authority is not.

That line matters.

How to make this stronger

Once the basic version works, improve it in this order:

1. Shrink the domain further

Instead of "frontend accessibility," focus on:

  • form validation
  • modal dialogs
  • table navigation
  • autocomplete widgets
  • date pickers

Narrower scope usually means better performance.

2. Improve source quality

Weight sources by trust level:

  • standards and specs
  • major accessibility references
  • browser or framework docs
  • internal audit reports
  • team conventions

Do not let random SEO soup outrank authoritative references.

3. Add source annotations

Ask the agent to label each claim as one of:

  • standard guidance
  • implementation recommendation
  • internal convention
  • hypothesis needing validation

That is a huge upgrade in clarity.

4. Add retrieval filters

Only search files tagged with things like:

  • accessibility
  • forms
  • design-system
  • validation
  • checkout

Less retrieval noise, fewer weird answers.

5. Add a second pass verifier

Use a second model pass to check:

  • unsupported claims
  • missing caveats
  • contradictory recommendations
  • source-less assertions

Multi-step verification is often more useful than adding more autonomy.

The deeper lesson

The future of agents is probably not one giant omniscient assistant doing everything.

It is more likely a messy ecosystem of:

  • narrow specialists
  • shared tool protocols
  • retrieval layers
  • policy gates
  • eval harnesses
  • human review loops

That sounds less cinematic.
It also sounds a lot more real.

The practical path in 2026 is not:

build an agent that can do anything

It is:

build an agent that can do one thing clearly, with bounded tools, inspectable outputs, and known failure modes

That is how you get something useful before the hype goblin eats your roadmap.

Final thought

Agents did evolve.

But the evolution was not from "dumb" to "intelligent employee."
It was from clever demo objects to tool-using software systems that can be reliable inside narrow boundaries.

That is progress.
It is also a much less magical story.

Which is fine.
Real engineering is usually less magical and more effective.

And honestly, that is the better trade.

Top comments (0)