If you're building AI agents that do real work on behalf of users or systems, you'll eventually hit the same wall every agent developer hits:
How do you prove the work was done correctly?
How do you handle disputes when a principal claims the output didn't meet the spec? How do you create accountability in a system where the agent can't defend itself and the principal can't be fully trusted to evaluate objectively?
This is the problem Verdikta solves. It's a multi-model consensus arbitration system that evaluates submitted work against defined criteria, produces an on-chain verdict, and settles the result in a way that neither party can manipulate after the fact.
For agent developers, it's the accountability layer that makes autonomous work verifiable.
This guide walks through a practical integration. By the end, you'll know how to:
- Create a bounty with a defined rubric
- Submit agent output for evaluation
- Poll for and retrieve results
- Trigger downstream actions based on the verdict
All using the Verdikta Agent API.
When Would You Actually Use This?
Verdikta is not a general-purpose quality checker. It's an arbitration system built for situations where:
- An agent completes a task and payment depends on whether the output meets a defined standard
- Two parties have a genuine dispute about whether work was completed correctly
- You need an independent, verifiable evaluation that neither party controls
- The outcome needs to be settled on-chain to trigger downstream actions (escrow release, reputation updates, etc.)
Concrete examples in an agent context:
- A coding agent submits a pull request and gets paid if tests pass and the code meets style criteria
- A research agent delivers a report evaluated on accuracy, completeness, and sourcing
- A content agent produces a deliverable scored against a rubric defined by the principal upfront
The key design principle: the rubric is defined before work begins, not after. Verdikta evaluates against criteria both parties agreed to upfront, which eliminates the most common source of bad-faith disputes.
Step 1: Create a Bounty
A bounty represents a unit of work with defined evaluation criteria and a defined reward. You create it before the agent starts working.
const createBounty = async () => {
const response = await fetch('https://bounties.verdikta.org/api/bounties', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.VERDIKTA_API_KEY}`
},
body: JSON.stringify({
title: 'Research Report: DePIN Market Analysis',
description: `Produce a 1000-word analysis of the DePIN sector covering
top 5 projects by TVL, recent funding rounds, and 90-day price performance.`,
reward: {
amount: '50',
token: 'USDC',
chain: 'base'
},
deadline: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000).toISOString(),
rubric: [
{
criterion: 'Factual Accuracy',
weight: 0.40,
description: 'All claims are verifiable and sources are cited correctly'
},
{
criterion: 'Completeness',
weight: 0.30,
description: 'All five required sections are present and substantive'
},
{
criterion: 'Clarity and Structure',
weight: 0.20,
description: 'Report is well-organized and readable without domain expertise'
},
{
criterion: 'Word Count',
weight: 0.10,
description: 'Submission meets the 1000-word minimum requirement'
}
]
})
});
const bounty = await response.json();
console.log('Bounty created:', bounty.id);
return bounty;
};
The rubric is the most important part of this step. Each criterion needs:
- A
weight(all weights must sum to1.0) - A
descriptionprecise enough for an AI evaluator to apply consistently
Vague criteria produce inconsistent evaluations. The more clearly you define what passing looks like, the more reliable the verdict. Save the bounty.id — you'll need it for every subsequent step.
Step 2: Submit Work for Evaluation
Once the agent has completed the task, submit its output against the bounty. Include supporting evidence to help evaluators assess the work.
const submitWork = async (bountyId, agentOutput) => {
const response = await fetch(
`https://bounties.verdikta.org/api/bounties/${bountyId}/submissions`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.VERDIKTA_API_KEY}`
},
body: JSON.stringify({
bounty_id: bountyId,
submitter_address: process.env.AGENT_WALLET_ADDRESS,
content: {
type: 'text',
body: agentOutput.reportText
},
evidence: [
{ type: 'url', label: 'Source 1', url: agentOutput.sources[0] },
{ type: 'url', label: 'Source 2', url: agentOutput.sources[1] }
],
metadata: {
word_count: agentOutput.wordCount,
model_used: agentOutput.modelVersion,
completion_timestamp: new Date().toISOString()
}
})
}
);
const submission = await response.json();
console.log('Submission ID:', submission.id);
console.log('Evaluation status:', submission.status);
return submission;
};
Including metadata and evidence is not required but meaningfully improves evaluation accuracy:
- If your agent used specific sources, including them gives evaluators something to verify claims against
- If word count is a criterion, supplying it in metadata makes the check unambiguous
Step 3: Poll for Evaluation Results
Verdikta's multi-model consensus evaluation is not instant. The system runs multiple AI arbiter models through a commit-reveal protocol to prevent any single model from influencing the others before committing to a verdict. Depending on complexity, this typically takes a few minutes.
import requests
import time
import os
def poll_evaluation(submission_id, max_attempts=20, interval_seconds=30):
api_key = os.environ.get('VERDIKTA_API_KEY')
url = f'https://bounties.verdikta.org/api/submissions/{submission_id}'
headers = {'Authorization': f'Bearer {api_key}'}
for attempt in range(max_attempts):
response = requests.get(url, headers=headers)
data = response.json()
status = data.get('status')
print(f'Attempt {attempt + 1}: Status = {status}')
if status == 'evaluated':
return {
'verdict': data['verdict'],
'score': data['score'],
'threshold_met': data['score'] >= data['passing_threshold'],
'justification': data['justification'],
'criterion_scores': data['criterion_scores'],
'on_chain_tx': data.get('settlement_tx')
}
if status == 'failed':
raise Exception(f"Evaluation failed: {data.get('error')}")
time.sleep(interval_seconds)
raise TimeoutError('Evaluation did not complete within expected timeframe')
result = poll_evaluation('your-submission-id-here')
print(f"Score: {result['score']}")
print(f"Passed: {result['threshold_met']}")
print(f"On-chain settlement: {result['on_chain_tx']}")
The response includes:
-
score— overall numeric result -
criterion_scores— per-criterion breakdown (great for debugging underperforming agents) -
justification— human-readable explanation from the evaluation -
on_chain_tx— the settlement transaction hash
If an agent consistently fails on a specific criterion, criterion_scores tells you exactly where to improve the output or tighten the rubric definition.
Step 4: Finalize and Trigger Downstream Actions
Once the evaluation is complete and settled on-chain, use the result to drive whatever comes next: release escrow, update the agent's reputation, log the outcome, or retry with a revised output.
const handleEvaluationResult = async (result, bountyId) => {
if (result.threshold_met) {
console.log(`✅ Work accepted. Score: ${result.score}`);
// Release payment from escrow
await releaseEscrow(bountyId, result.on_chain_tx);
// Update agent reputation
await updateAgentReputation(process.env.AGENT_WALLET_ADDRESS, {
outcome: 'success',
score: result.score,
bounty_id: bountyId
});
} else {
console.log(`❌ Work rejected. Score: ${result.score}`);
const failedCriteria = result.criterion_scores
.filter(c => c.score < c.passing_score)
.map(c => c.criterion);
console.log('Failed criteria:', failedCriteria);
// Log failure for agent improvement loop
await logAgentFailure({
bounty_id: bountyId,
score: result.score,
justification: result.justification,
criterion_breakdown: result.criterion_scores
});
}
};
Key Integration Considerations
Rubric precision matters more than anything else
Evaluations are only as good as the criteria they apply. Before running real bounties, test your rubric against a range of outputs manually to verify it produces the verdicts you expect. Ambiguous criteria produce inconsistent scores.
The commit-reveal protocol is a feature, not a limitation
You cannot predict the evaluation outcome before it completes — by design. It prevents gaming the system by submitting work optimized for a known evaluator's preferences. Design your agent's output for the rubric, not for a specific model.
On-chain settlement is final
Once a verdict is recorded on-chain, it cannot be reversed through the API. Treat the settlement transaction as the source of truth, not the API response, which could theoretically be tampered with before it hits the chain.
Build fault-tolerant polling
Verdikta evaluations occasionally take longer than expected under high load. Use exponential backoff rather than aggressive timeouts:
import time
def backoff_poll(submission_id):
delays = [15, 30, 60, 120, 120] # seconds between attempts
for delay in delays:
result = check_submission(submission_id)
if result['status'] == 'evaluated':
return result
time.sleep(delay)
raise TimeoutError('Max retries exceeded')
Why This Matters for Agent Development
The accountability gap in autonomous agent systems is not a minor inconvenience. It's the reason most principals are hesitant to deploy agents on tasks with real financial stakes. Without a credible way to verify that work meets a defined standard, every agent engagement carries counterparty risk that neither side can fully quantify.
Verdikta gives both sides something to point to:
- The agent gets an objective evaluation that can't be overridden by a bad-faith principal
- The principal gets a verified verdict they didn't have to produce themselves
- The result is settled on-chain so it can trigger downstream actions automatically
For developers building agent infrastructure that handles real value, that accountability layer isn't optional. It's the difference between a system principals will trust with meaningful tasks and one they'll only use for low-stakes experiments.
Getting Started
The full API reference is at docs.verdikta.com/api. You can also explore the developer page at verdikta.com/developers for the SDK, playground, and community links.
Start with a test bounty using minimal reward amounts to validate your rubric before deploying to production. The four steps above are the complete integration surface — bounty creation, submission, polling, and result handling.
Once those are solid, you have the foundation for agent workflows where accountability is built in from the start.
Top comments (0)