DEV Community

Rafael Bruno
Rafael Bruno

Posted on

How to add reputation scoring to your LangChain agent in 5 lines

Your LangChain agent calls a research tool. The tool returns a confident answer. The answer is wrong.

You have no way to know if that tool — or the agent behind it — has a history of being wrong. There's no track record, no score, no audit trail. You just trust it.

That's the problem AgentRep solves.

What it does

AgentRep is a reputation protocol for AI agents. Every task outcome gets evaluated by an LLM judge (Claude) and recorded permanently on Base L2. The result is a public trust score — queryable by anyone, owned by no one.

Install it:

pip install agentrep
Enter fullscreen mode Exit fullscreen mode

Zero dependencies. Stdlib only.

The 5-line integration

from agentrep.integrations.langchain import AgentRepToolkit

toolkit = AgentRepToolkit(api_key="ar_xxx")
tools = toolkit.get_tools()

# Pass tools to any LangChain agent as usual
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
Enter fullscreen mode Exit fullscreen mode

This adds two tools to your agent:

  • check_reputation(wallet_address) — returns score, tier, success rate, and category breakdown
  • submit_outcome(contractor, requester, task, deliverable, category, value_usdc) — submits a task result for LLM evaluation

Your agent can now decide whether to trust another agent before delegating, and report back after a task completes.

How the LLM judge works

When you submit an outcome, AgentRep sends the task description and deliverable to Claude with a structured evaluation prompt. The judge returns:

{
  "verdict": "SUCCESS",
  "reasoning": "The deliverable fully addresses the task requirements...",
  "confidence": 0.91
}
Enter fullscreen mode Exit fullscreen mode

Only SUCCESS or FAILURE — no partial verdicts. This keeps scores honest and manipulation-resistant.

The verdict is then recorded on-chain via AgentRepRegistry.sol on Base L2 (chainId 8453). Gas cost is negligible — Base averages under $0.01 per transaction.

Querying reputation without auth

Reading scores is free and requires no API key:

from agentrep import AgentRep

client = AgentRep()  # no key needed for reads

score = client.get_reputation("0xAGENT_WALLET_ADDRESS")
print(score.score)          # 87.5
print(score.tier)           # TRUSTED
print(score.success_rate)   # 0.92
print(score.total_outcomes) # 48
Enter fullscreen mode Exit fullscreen mode

Tiers: UNRATEDBRONZESILVERGOLDELITE

Score is also broken down by category: code-review, research, data-analysis, writing, web-scraping, and others.

Full example: agent that checks before delegating

from langchain.agents import initialize_agent, AgentType
from langchain_anthropic import ChatAnthropic
from agentrep.integrations.langchain import AgentRepToolkit

llm = ChatAnthropic(model="claude-sonnet-4-6")
toolkit = AgentRepToolkit(api_key="ar_xxx")

agent = initialize_agent(
    toolkit.get_tools(),
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

result = agent.run(
    "Check if agent 0xABC...123 is trustworthy for code-review tasks, "
    "then summarize their track record."
)
Enter fullscreen mode Exit fullscreen mode

CrewAI and AutoGen

Same SDK, different import:

# CrewAI
from agentrep.integrations.crewai import AgentRepTool
tool = AgentRepTool(api_key="ar_xxx")

# AutoGen
from agentrep.integrations.autogen import register_agentrep_functions
register_agentrep_functions(assistant, user_proxy, api_key="ar_xxx")
Enter fullscreen mode Exit fullscreen mode

Register an agent

To start building a reputation, register a wallet:

from agentrep import AgentRep

client = AgentRep()
result = client.register(
    wallet_address="0xYOUR_WALLET",
    name="My Research Agent",
    description="Specializes in academic research and summarization",
    categories=["research", "writing"],
)
print(result.api_key)  # ar_xxx — store this, shown only once
Enter fullscreen mode Exit fullscreen mode

What's on-chain vs. off-chain

Data Where
Verdict + reasoning PostgreSQL (queryable via API)
Score aggregates Base L2 smart contract
Category breakdown Redis cache + PostgreSQL
Raw task content Off-chain only

The smart contract stores the minimum needed for trustless verification — aggregated scores and outcome counts. Full reasoning lives off-chain but is accessible via API.


Still early. Feedback on the evaluation rubric especially welcome — open an issue or comment below.

Top comments (0)