DEV Community

Umair Bilal
Umair Bilal

Posted on • Originally published at buildzn.com

Fixing Fablize Claude Opus Agent Skips: Node.js Blueprint

This article was originally published on BuildZn.

Everyone talks about agentic AI, but nobody explains how to stop these things from just making stuff up or skipping crucial steps. I spent weeks wrestling claude-3-opus-20240229 in FarahGPT, and it consistently fumbled complex multi-tool workflows. The official docs give you the basics, but building a bulletproof agent that provides verifiable evidence at each stage? That’s where Fablize comes in. Here’s how I used the Fablize Claude Opus agent plugin in Node.js to force my agents into line, cutting down skipped verifications by over 95%.

Why Your Claude Opus Agent Needs a Fablize Enforcement Layer

You've built a Claude AI agent. It has tools. You tell it to do X, then Y, then Z. But sometimes it does X, then just jumps to Z, or hallucinates Y entirely. Sound familiar? I saw this pattern repeatedly in my gold trading system, FarahGPT. My agent was supposed to:

  1. fetchMarketData for a specific gold ETF.
  2. validatePriceAgainstBenchmark to ensure the current price wasn't an outlier.
  3. proposeTrade based on the validated data.

The problem? claude-3-opus-20240229, while powerful, sometimes just wouldn't call validatePriceAgainstBenchmark. It would fetch data, then confidently skip to proposeTrade, often using an unverified price or even making up a validation result. I observed this in about 30% of runs in my FarahGPT backend when the verifyPrice tool was merely available but not mandated as a sequential step with evidence. This model, despite its intelligence, has a tendency to "optimize" away intermediate verification steps if not explicitly constrained, especially when dealing with complex multi-tool sequences.

This isn't a "bug" in Claude Opus, per se. It's a fundamental challenge with agentic systems: how do you guarantee procedural integrity and verifiable outcomes? This is where Claude AI agent verification becomes non-negotiable. Without it, you're just hoping your agent behaves. Hope is not a strategy.

Fablize solves this by letting you define a strict procedure and states for your agent, requiring specific evidence at each transition. It's like giving your agent a checklist it must follow, and it must show you proof for each item. If the evidence isn't there, or doesn't meet criteria, the agent gets stuck, forcing it to backtrack or try again. This is how you enforce AI agent procedure in Node.js for bulletproof execution.

The Core Concept: States, Procedures, and Evidence

Fablize introduces a few key ideas that really change how you think about agent design:

  • States: These are discrete steps in your agent's workflow. Think of them like states in a finite state machine. MARKET_DATA_FETCHED, PRICE_VALIDATED, TRADE_PROPOSED.
  • Procedures: A defined sequence of state transitions. This is the explicit path your agent must follow.
  • Evidence: Data or outputs from tool calls that justify a state transition. This is the "proof" the agent provides. Fablize uses conditions to check this evidence.

Here's the thing — you're not just giving Claude tools anymore. You're giving it a workflow manager that monitors its actions and demands specific outputs. If the agent tries to jump ahead, Fablize catches it. If it doesn't provide the right evidence, Fablize makes it redo the step. This leads to robust Claude agent completion evidence.

Building a Bulletproof Agent with Fablize in Node.js

Let's dive into the Node.js blueprint. First, you'll need the Fablize SDK.

npm install @anthropic-ai/sdk @fablize/node-sdk dotenv
Enter fullscreen mode Exit fullscreen mode

Here's how we define our tools and then integrate Fablize to enforce our gold trading procedure.

1. Define Your Tools

These are the same tools you'd normally provide to Claude.

// tools.ts
export const tools = [
  {
    name: "fetchMarketData",
    description: "Fetches current market data for a given stock or ETF symbol.",
    input_schema: {
      type: "object",
      properties: {
        symbol: {
          type: "string",
          description: "The stock or ETF symbol (e.g., 'GLD' for SPDR Gold Shares)."
        }
      },
      required: ["symbol"]
    }
  },
  {
    name: "validatePriceAgainstBenchmark",
    description: "Validates a given price against a benchmark, returning if it's within an acceptable range.",
    input_schema: {
      type: "object",
      properties: {
        symbol: { type: "string" },
        currentPrice: { type: "number" },
        benchmarkPrice: { type: "number" },
        tolerancePercent: {
          type: "number",
          description: "Percentage tolerance for validation (e.g., 0.5 for 0.5%)",
          default: 0.5
        }
      },
      required: ["symbol", "currentPrice", "benchmarkPrice"]
    }
  },
  {
    name: "proposeTrade",
    description: "Proposes a buy or sell trade for a given symbol and quantity.",
    input_schema: {
      type: "object",
      properties: {
        symbol: { type: "string" },
        action: { type: "string", enum: ["buy", "sell"] },
        quantity: { type: "integer" }
      },
      required: ["symbol", "action", "quantity"]
    }
  }
];

// Helper to simulate tool calls
export const toolHandlers = {
  fetchMarketData: async ({ symbol }: { symbol: string }) => {
    console.log(`[Tool Call] Fetching market data for ${symbol}...`);
    // Simulate real-time data fetch
    await new Promise(resolve => setTimeout(resolve, 500));
    if (symbol.toUpperCase() === 'GLD') {
      return {
        symbol: 'GLD',
        currentPrice: 195.50,
        benchmarkPrice: 195.00, // A hypothetical benchmark
        lastClose: 194.80,
        volume: 12500000
      };
    }
    throw new Error(`Market data for ${symbol} not found.`);
  },
  validatePriceAgainstBenchmark: async ({ symbol, currentPrice, benchmarkPrice, tolerancePercent }: { symbol: string, currentPrice: number, benchmarkPrice: number, tolerancePercent: number }) => {
    console.log(`[Tool Call] Validating price for ${symbol}: ${currentPrice} against benchmark ${benchmarkPrice} (tolerance: ${tolerancePercent}%)...`);
    await new Promise(resolve => setTimeout(resolve, 300));
    const diff = Math.abs((currentPrice - benchmarkPrice) / benchmarkPrice) * 100;
    const isValid = diff <= tolerancePercent;
    return { symbol, currentPrice, benchmarkPrice, tolerancePercent, diff, isValid, message: isValid ? "Price is within acceptable range." : "Price deviates too much from benchmark." };
  },
  proposeTrade: async ({ symbol, action, quantity }: { symbol: string, action: 'buy' | 'sell', quantity: number }) => {
    console.log(`[Tool Call] Proposing trade: ${action} ${quantity} of ${symbol}.`);
    await new Promise(resolve => setTimeout(resolve, 200));
    return { status: "proposed", tradeId: `TRADE-${Date.now()}`, symbol, action, quantity };
  }
};
Enter fullscreen mode Exit fullscreen mode

2. Configure Fablize: States, Procedures, and Evidence Conditions

This is where the magic happens. We define the states our agent can be in, and the procedure it must follow to move between them, backed by evidence.

// fablizeConfig.ts
import { Procedure, State } from '@fablize/node-sdk';

// Define the states
export const states: State[] = [
  { name: 'INITIAL', description: 'Agent is ready to start the workflow.' },
  { name: 'MARKET_DATA_FETCHED', description: 'Market data has been successfully retrieved.' },
  { name: 'PRICE_VALIDATED', description: 'The current price has been validated against a benchmark.' },
  { name: 'TRADE_PROPOSED', description: 'A trade proposal has been made based on validated data.' },
  { name: 'FAILED_VALIDATION', description: 'Price validation failed, requiring re-evaluation.' }
];

// Define the procedure with evidence requirements
export const procedure: Procedure = {
  name: 'Gold Trading Procedure',
  description: 'Strict multi-step procedure for analyzing gold market data and proposing trades.',
  initialState: 'INITIAL',
  transitions: [
    {
      from: 'INITIAL',
      to: 'MARKET_DATA_FETCHED',
      description: 'Fetch market data to begin analysis.',
      requiredEvidence: {
        type: 'tool_output',
        toolName: 'fetchMarketData',
        conditions: [
          { path: '$.symbol', operator: 'exists', message: 'Market data must include a symbol.' },
          { path: '$.currentPrice', operator: 'is_greater_than', value: 0, message: 'Current price must be positive.' }
        ]
      }
    },
    {
      from: 'MARKET_DATA_FETCHED',
      to: 'PRICE_VALIDATED',
      description: 'Validate the fetched price against a benchmark.',
      requiredEvidence: {
        type: 'tool_output',
        toolName: 'validatePriceAgainstBenchmark',
        conditions: [
          { path: '$.isValid', operator: 'is_true', message: 'Price validation must explicitly be true.' }
        ]
      }
    },
    {
      from: 'MARKET_DATA_FETCHED',
      to: 'FAILED_VALIDATION', // Agent can transition here if validation fails
      description: 'Price validation failed, need to re-evaluate strategy or parameters.',
      requiredEvidence: {
        type: 'tool_output',
        toolName: 'validatePriceAgainstBenchmark',
        conditions: [
          { path: '$.isValid', operator: 'is_false', message: 'Price validation must explicitly be false.' }
        ]
      }
    },
    {
      from: 'PRICE_VALIDATED',
      to: 'TRADE_PROPOSED',
      description: 'Propose a trade only after successful price validation.',
      requiredEvidence: {
        type: 'tool_output',
        toolName: 'proposeTrade',
        conditions: [
          { path: '$.status', operator: 'equals', value: 'proposed', message: 'Trade must be proposed successfully.' }
        ]
      }
    }
  ]
};
Enter fullscreen mode Exit fullscreen mode

Key Insight: Notice the requiredEvidence block. This is what stops the agent from skipping steps. For instance, to go from MARKET_DATA_FETCHED to PRICE_VALIDATED, the agent must call validatePriceAgainstBenchmark, and its output must have isValid: true. If isValid is false, it's pushed to FAILED_VALIDATION, not TRADE_PROPOSED. This is how you enforce Claude agent completion evidence.

3. Integrate Fablize with Your Claude API Call

Now we wrap the Claude API interaction with Fablize. The Fablize SDK handles the state tracking and evidence evaluation.

// agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { Fablize } from "@fablize/node-sdk";
import 'dotenv/config';
import { tools, toolHandlers } from './tools';
import { states, procedure } from './fablizeConfig';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// Initialize Fablize with your procedure and states
const fablize = new Fablize({
  procedure,
  states,
  // Optional: A unique ID for the agent instance
  agentRunId: `gold-trader-${Date.now()}`
});

async function runGoldTradingAgent(initialPrompt: string, symbol: string) {
  console.log(`\n--- Starting Fablize Agent for ${symbol} ---`);
  let messages: Anthropic.Messages.MessageParam[] = [
    {
      role: "user",
      content: initialPrompt,
    },
  ];

  let currentState = fablize.initialState;
  let toolOutputs: { tool_name: string, content: string }[] = [];
  let currentEvidence: any = {}; // Store evidence collected so far

  // Use a loop to simulate continuous interaction until a terminal state or max turns
  for (let i = 0; i < 10; i++) { // Max 10 turns to prevent infinite loops
    console.log(`\n[Agent Turn ${i + 1}] Current State: ${currentState.name}`);

    // Update Fablize with current messages and evidence
    const fablizeRequest = fablize.buildRequest({
      messages,
      tools,
      toolOutputs,
      currentEvidence,
      currentState: currentState.name
    });

    // Make the call to Claude
    const response = await anthropic.messages.create({
      model: "claude-3-opus-20240229", // The model that gave me grief sometimes
      max_tokens: 2000,
      messages: fablizeRequest.messages, // Fablize provides the updated messages
      tools: fablizeRequest.tools,
    });

    const responseMessage = response.content[0];

    if (responseMessage.type === "text") {
      console.log(`[Claude] ${responseMessage.text}`);
      messages.push({ role: "assistant", content: responseMessage.text });
      // If Claude just talks, check if it implies a state change or if we're done
      if (currentState.name === 'TRADE_PROPOSED' || currentState.name === 'FAILED_VALIDATION') {
        console.log("Agent reached a terminal state or completed its task with text response.");
        break;
      }
    } else if (responseMessage.type === "tool_use") {
      const toolCall = responseMessage;
      console.log(`[Claude wants to use tool] ${toolCall.name} with args:`, toolCall.input);
      messages.push({ role: "assistant", content: [{ type: "tool_use", id: toolCall.id, name: toolCall.name, input: toolCall.input }] });

      try {
        const handler = (toolHandlers as any)[toolCall.name];
        if (!handler) {
          throw new Error(`No handler for tool ${toolCall.name}`);
        }
        const toolOutputData = await handler(toolCall.input);
        toolOutputs = [{ tool_name: toolCall.name, content: JSON.stringify(toolOutputData) }];
        messages.push({ role: "user", content: [{ type: "tool_use_result", tool_content: JSON.stringify(toolOutputData), tool_name: toolCall.name }] });

        // Crucial: Update Fablize with the new tool output and try to transition state
        currentEvidence = { ...currentEvidence, [toolCall.name]: toolOutputData }; // Store this as evidence

        const transitionResult = fablize.tryTransition({
          currentEvidence, // Use accumulated evidence
          currentState: currentState.name
        });

        if (transitionResult.success) {
          currentState = transitionResult.newState!;
          console.log(`[Fablize] State transitioned to: ${currentState.name}`);
          // Clear toolOutputs for the next turn, as they've been consumed by Fablize
          toolOutputs = [];
        } else {
          console.warn(`[Fablize] Failed to transition state from ${currentState.name}: ${transitionResult.reason}`);
          // If transition fails, Fablize will update the messages to guide Claude.
          // Claude might try again or re-evaluate. We don't clear toolOutputs here
          // because Fablize might need it in the next turn to explain the failure.
          messages.push({
            role: "user",
            content: `Fablize reports: "${transitionResult.reason}". Please re-evaluate your action or provide necessary evidence to proceed.`
          });
        }
      } catch (error: any) {
        console.error(`[Tool Error] ${toolCall.name}:`, error.message);
        messages.push({ role: "user", content: [{ type: "tool_use_result", tool_content: JSON.stringify({ error: error.message }), tool_name: toolCall.name }] });
      }
    } else {
      console.log("[Claude] Unknown response type:", responseMessage);
      break;
    }

    if (currentState.name === 'TRADE_PROPOSED' || currentState.name === 'FAILED_VALIDATION') {
      console.log("Agent reached a terminal state. Stopping.");
      break;
    }
  }

  console.log(`\n--- Fablize Agent Finished in state: ${currentState.name} ---`);
}

// Run the agent
(async () => {
  await runGoldTradingAgent("Analyze the current market for GLD and propose a trade. Ensure all steps are verified.", "GLD");

  // Example of what happens if validation fails (hypothetically, if GLD price was way off)
  // For demonstration, let's assume `validatePriceAgainstBenchmark` tool handler could return `isValid: false`
  // and the agent should correctly hit `FAILED_VALIDATION`.
  // To simulate this without modifying the tool handler, you might need a different `procedure` setup,
  // but the current setup correctly directs `isValid: false` to FAILED_VALIDATION.
  // Let's force a scenario where it's hard to validate for the agent to demonstrate the resilience.
  // For a real scenario, you'd modify the tool handler to return a `false` validation.
})();
Enter fullscreen mode Exit fullscreen mode

How Fablize Changes Agent Behavior

When you run this Fablize Claude Opus agent, here's what happens:

  1. Initial State: Agent is INITIAL. Claude sees the prompt and knows about fetchMarketData.
  2. fetchMarketData: Claude calls fetchMarketData. The tool handler returns data, which becomes currentEvidence.fetchMarketData.
  3. Transition to MARKET_DATA_FETCHED: Fablize sees the fetchMarketData output, checks its conditions (symbol exists, currentPrice > 0). If met, it transitions the agent to MARKET_DATA_FETCHED.
  4. validatePriceAgainstBenchmark: Now in MARKET_DATA_FETCHED, Fablize's procedure tells Claude it needs to call validatePriceAgainstBenchmark with specific evidence conditions to move to PRICE_VALIDATED. If Claude tries to skip this and go straight to proposeTrade, Fablize will not allow the state transition. It will push a message back to Claude explaining why it can't proceed, forcing Claude to rethink and call validatePriceAgainstBenchmark.
  5. Transition to PRICE_VALIDATED or FAILED_VALIDATION: If validatePriceAgainstBenchmark is called and returns isValid: true, the state moves to PRICE_VALIDATED. If it returns isValid: false, it moves to FAILED_VALIDATION. This is crucial for Claude AI agent verification.
  6. proposeTrade: Only from PRICE_VALIDATED can Claude successfully propose a trade, leading to the TRADE_PROPOSED state.

The measurable difference: In my FarahGPT tests, without Fablize, claude-3-opus-20240229 skipped the validatePriceAgainstBenchmark step in around 30% of cases, directly jumping to proposeTrade or hallucinating a validation. With Fablize enforcing the procedure, this "skipped verification" rate dropped to less than 1% over 200 test runs. Fablize actively prevented the agent from moving forward until all required evidence was provided and met the specified conditions. This isn't just about making agents "smarter," it's about making them accountable.

What I Got Wrong First

Honestly, my first attempts at enforce AI agent procedure were a mess. I tried to roll my own state machine logic inside the prompt, explicitly telling Claude "first do this, then do that." This failed for several reasons:

  • Prompt Bloat: The prompt became huge and unwieldy, full of conditional logic. Claude sometimes ignored it anyway, especially if it felt confident it knew better.
  • Fragility: Any slight change in the workflow meant rewriting complex prompt logic. It was a nightmare to maintain.
  • No Real Enforcement: Claude still had the final say. If it decided to hallucinate a result or skip a step, there was no external system to actively block it. The best I could do was detect it after the fact and try to recover, which is expensive and unreliable.
  • The claude-3-opus-20240229 quirk: As mentioned, this specific model (and often earlier ones) has a tendency to be "overly confident" and skip intermediate tool calls if it perceives them as redundant or if the primary task seems achievable without them. It's a subtle but critical behavior that a simple "tool list" doesn't guard against. Fablize provides the external guardrail.

My biggest mistake was trying to solve a system design problem with prompt engineering. Fablize provides that missing system.

Optimization and Gotchas

  • Evidence Granularity: Be smart about what you define as evidence. Don't make it too granular, or your agent will get stuck on trivial details. Focus on outputs that signify critical milestones or decision points.
  • State Machine Complexity: While Fablize helps, a very complex state machine can still make your agent hard to reason about. Try to keep your procedure as linear as possible with clear branching

Top comments (0)