DEV Community

Cover image for Stop Asking AI for Answers. Start Asking If the Evidence Is Ready.
Vasili
Vasili

Posted on

Stop Asking AI for Answers. Start Asking If the Evidence Is Ready.

Most AI agents are optimized to produce an answer.

But in serious workflows, the answer is not the hard part.

The hard part is knowing whether that answer is supported well enough for a human to trust it, act on it, or escalate it.

That is the problem I am working on with Agenda Intelligence MD:

An evidence-readiness and trust-routing runtime for high-stakes AI-assisted decisions.

GitHub: vassiliylakhonin/agenda-intelligence-md

The problem: AI can summarize before it can be trusted

Summarization is useful.

But many real-world decisions are not blocked by the lack of a summary. They are blocked by uncertainty:

  • Which claims are actually supported?
  • Which claims are weak?
  • Which source categories are missing?
  • Who needs to act next?
  • Is this file ready for review?
  • Should this be escalated before a decision is made?

This matters in workflows like:

  • vendor evidence review;
  • RFP and procurement analysis;
  • AI vendor due diligence;
  • strategic infrastructure project rooms;
  • market-entry readiness;
  • sanctions-adjacent exposure triage;
  • corridor, maritime, and counterparty risk files.

In those settings, a polished AI-generated memo can be dangerous if it hides evidence gaps.

Agenda Intelligence MD is built around a different idea:

The next layer of agent infrastructure is not better summarization. It is knowing when an AI-generated brief is not ready to be trusted.

What Agenda Intelligence MD does

Agenda Intelligence MD turns messy input packs into structured human-review packets.

The inputs can be things like:

  • RFP responses;
  • vendor claims;
  • source packs;
  • risk files;
  • model cards;
  • project notes;
  • weekly status updates;
  • public documentation;
  • analyst-style briefs.

The output is not just a summary.

It is a structured review layer that surfaces:

  • supported claims;
  • weak or under-sourced claims;
  • missing evidence categories;
  • source coverage diagnostics;
  • owner actions;
  • decision-readiness routing;
  • escalation signals;
  • heuristic scoring.

The goal is not to replace human judgment.

The goal is to make the review surface clearer before a human makes a decision.

What makes it different from a normal AI summarizer?

A normal summarizer asks:

“What does this document say?”

Agenda Intelligence MD asks:

“Is this document ready to support a decision?”

That distinction changes the architecture.

Instead of treating the AI output as the final deliverable, the project treats it as something that must pass through a readiness layer.

For example, a vendor might claim that their AI product is safe for regulated enterprise use.

A summarizer can compress that claim into a nice paragraph.

Agenda Intelligence MD is designed to ask a more useful set of questions:

  • Is the claim linked to evidence?
  • Is the evidence first-party, third-party, stale, missing, or incomplete?
  • Are there standards, audit artifacts, security documents, or governance materials missing?
  • Does this need a procurement owner, legal reviewer, technical reviewer, or compliance escalation?
  • Is the brief ready for a decision, or only ready for more questions?

That is the difference between generating text and routing trust.

Architecture

The project is implemented as a Python package with multiple delivery surfaces around one core service layer.

It includes:

  • a CLI;
  • an MCP stdio server;
  • an HTTP API shell;
  • an A2A adapter;
  • JSON schemas;
  • validators;
  • evidence audit;
  • source coverage diagnostics;
  • heuristic scoring;
  • vertical worker profiles.

This makes it usable in several different modes.

You can inspect it locally through the CLI.

You can integrate it into an agent workflow through MCP.

You can expose structured behavior over HTTP.

You can experiment with A2A-style agent routing.

The interesting part is not just that these interfaces exist. It is that they point toward the same product idea: evidence-readiness should be a reusable layer, not a one-off prompt.

Quick start

After installing the package, the basic local flow looks like this:

pip install agenda-intelligence-md

agenda-intelligence doctor
agenda-intelligence validate-brief examples/agenda-brief.json
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json
agenda-intelligence weekly-delta examples/strategic-infrastructure-bankability/status.synthetic.md
Enter fullscreen mode Exit fullscreen mode

The commands are designed to answer practical questions:

  • Is the package installed correctly?
  • Does this brief match the schema?
  • How strong is the structure / evidence / decision-readiness?
  • What changed in a weekly status update?
  • Which claims are unsafe to repeat?
  • What evidence is still missing?

That last question is the most important one.

Because in real decision workflows, “what is missing?” is often more valuable than “what is the answer?”

Example: AI vendor evidence-readiness

One of the current discovery wedges for the project is AI vendor evidence-readiness for regulated procurement.

Imagine a buyer reviewing an AI vendor for an enterprise or regulated environment.

The buyer has:

  • an RFP;
  • vendor claims;
  • public documentation;
  • security pages;
  • model cards;
  • standards references;
  • maybe some missing or vague materials.

A normal AI assistant can summarize the vendor.

But a buyer does not only need a summary.

They need a review packet:

  • What claims are supported?
  • Which claims are marketing language?
  • Which security or governance documents are missing?
  • Which buyer questions remain unanswered?
  • What should be escalated before approval?
  • What can be reviewed now, and what cannot?

That is the kind of workflow Agenda Intelligence MD is designed to support.

It is not trying to be the decision-maker.

It is trying to prepare the decision surface.

Vertical profiles

The repository also includes vertical profiles and demo surfaces for several high-stakes workflows, including:

  • Middle Corridor Deal Risk Gate;
  • CIS Secondary-Sanctions Exposure;
  • Agentic Interaction Trust Gate;
  • Gulf Maritime Exposure Gate;
  • Kazakhstan Market-Entry Readiness Gate.

These are not generic chatbot personalities.

They are structured reasoning surfaces for evidence-heavy review workflows.

The pattern is:

input pack -> structured review packet -> evidence gaps -> owner actions -> decision-readiness route
Enter fullscreen mode Exit fullscreen mode

That pattern is useful because many high-stakes workflows fail in the handoff between AI output and human responsibility.

Agenda Intelligence MD focuses on that handoff.

What this is not

This project is intentionally bounded.

It is not:

  • a factuality verifier;
  • a legal advisor;
  • a compliance approval engine;
  • a sanctions determination tool;
  • a financial or investment advisor;
  • an autonomous decision-maker;
  • a replacement for analyst review.

The scoring is heuristic.

It evaluates structure, source coverage, evidence labeling, and decision-readiness signals.

It does not prove that a claim is true.

That boundary matters.

The point is not to say:

“The AI is right.”

The point is to say:

“Here is what the AI-assisted packet can support, here is what it cannot support, and here is where a human needs to review.”

Why MCP and A2A matter here

MCP and A2A are interesting because they push agent systems toward composable infrastructure.

But composability also increases risk.

If agents can call tools, route tasks, and generate structured outputs, then they also need a way to communicate uncertainty, missing evidence, and escalation requirements.

Otherwise, agent systems become very good at moving unsupported claims through a workflow faster.

Agenda Intelligence MD is an experiment in making the trust layer explicit.

Not hidden in a prompt.

Not buried in a paragraph.

Not left to the final reviewer to reconstruct manually.

Instead, the runtime exposes readiness, gaps, and routing as structured outputs.

Why I built it

I started from a simple observation:

A lot of AI work focuses on making outputs more fluent.

But in serious workflows, fluency is not the bottleneck.

The bottleneck is whether the output is usable for a decision.

A beautiful memo with missing evidence is still a weak memo.

A confident recommendation with unclear source coverage is still risky.

A summary that does not show what it cannot support is not enough.

I wanted a system that treats evidence gaps as first-class objects.

Who should look at this?

You may find the project interesting if you are working on:

  • AI agents;
  • MCP servers;
  • A2A experiments;
  • procurement technology;
  • AI governance;
  • risk intelligence;
  • analyst workflows;
  • structured evaluation;
  • human-in-the-loop review;
  • decision-support systems.

The repo is especially relevant if you are asking:

How do we make AI-assisted workflows more reviewable before they become more autonomous?

What to inspect first

If you open the repository, I would suggest looking at four areas:

  1. The CLI flow
    Start with the examples and validation commands.

  2. The schemas
    The schemas show what the project treats as structured review output.

  3. The MCP integration
    This is useful if you are thinking about agent-tool interoperability.

  4. The vertical profiles
    These show how the same evidence-readiness pattern can be adapted to different domains.

The bigger idea

I do not think every AI agent needs to make more decisions.

I think many AI agents need to become better at saying:

  • this is supported;
  • this is weak;
  • this is missing;
  • this needs review;
  • this is not ready yet.

That is less flashy than autonomous decision-making.

But it is much closer to what many real organizations need.

The future of AI infrastructure will not only be about agents that can act.

It will also be about systems that know when not to act yet.

That is the layer Agenda Intelligence MD is exploring.

GitHub: vassiliylakhonin/agenda-intelligence-md

If this direction is interesting to you, I would appreciate your reactions, issues, critiques, or architecture reviews.

Top comments (0)