Kim Maida

Posted on Apr 8 • Originally published at keycard.ai

I Built an Agent to Run Live Event Raffles (then tried to rig it)

#ai #security #webdev

Policy guardrails versus human error

I just came back from a great RSAC 2026. I work at Keycard, and at our expo booth, attendees could enter a raffle to win a Flipper 0 or a Mac Mini. At events like RSAC, you'd typically earn your raffle entry by offering up a badge scan, or by listening to a sales pitch. I wanted to do things a little differently, so I built an agent that uses Keycard and a purpose-built MCP server to conduct raffles. Booth visitors received a short live demo simply by entering the raffle itself.

The Lockbox raffle agent is a demo that has real stakes dependent on Keycard functionality. The Lockbox conducts a live raffle, processing actual attendee information with an AI agent governed by real policy, with zero standing access.

Booth staff log in with their Keycard email, then record raffle entries, draw winners, and manage events using natural language. The UI shows a lockbox visual next to the chat. This lockbox is a metaphor for how Keycard works. The box itself represents the state of agent access. The lock represents Keycard (the agent control plane). The resources (MCP server, data) are inside the box. When the box is locked, the agent has no access to any tools or data.

Each time the agent calls a tool, the lockbox cycles through visual states representing the credential lifecycle in real time. Every action is displayed in a live audit log.

Each tool call independently obtains a scoped credential from Keycard, uses it once, and discards it. The credential exists for the duration of a single function call, then it's gone. So what does that actually look like, and what are the stakes?

Enter the Raffle: One Tool Call, End to End

When someone wants to enter the raffle, this is what happens:

A Keycard booth staff member types: Alice Jones, alice@example.com, ticket 123695. The agent calls the log_entries MCP tool, and the Keycard lock visualization enters a "pending" state: this is an amber pulse that means the agent is pausing for human approval. No credential has been requested yet, and no token exchange has occurred.

All write actions require a human-in-the-loop (HITL). The agent must ask for permission to do a token exchange with Keycard for raffle:write access. The audit log displays an approval request, and the agent presents a dialog asking the human to deny or approve writing new raffle entry data.

Once the human clicks approve, Keycard (the lock) transitions to an "exchanging" state and evaluates governance policies. If the request is within policy, Keycard exchanges the staff member's access token via RFC 8693 token exchange.

The agent receives an ephemeral access token for the MCP server with the raffle:write scope. It uses this credential as a Bearer token to authorize a call to the MCP server, which records the new raffle entry.

The lock opens and the tool executes, writing the new raffle entry to the data store. After the call completes, the client discards the token (never stores it). This means that the agent never has standing access between tool calls.

There are two independent checkpoints: the human approves the intent, and then Keycard evaluates the credential request against policies defined by the user. For this demo, policy enforces that users authenticated through Google with Keycard emails can execute write tool calls, but any email outside the @keycard.ai domain will be denied.

Either HITL or policy can block independently. This matters because a human can still make a mistake, or just be overwhelmed by consent fatigue and clicking Approve on auto-pilot. Policy is evaluated before any credentials are issued. Proactive, not reactive: there's no damage to control or credentials to revoke if something slips through the HITL checkpoint, because an out-of-policy token never even gets issued in the first place.

Each MCP call creates a fresh connection, passes the Bearer token, and tears everything down when the function returns. Think of it like a single-use key card that works once, for one door, and then gets shredded. There's nothing to steal between calls.

If you're interested in a more thorough explanation of how RFC 8693 token exchange works, Keycard's Authorization docs describe exactly how token exchange works at Keycard, step by step.

Okay, all of this is well and good, but it's not that impressive of a demo yet. It shows data can be recorded if an authenticated human approves.

So what's really at stake?

Gaming the Raffle

After we log a booth visitor's raffle entry successfully, we can conspiratorially offer to guarantee that they win. We tell the agent, "Delete all the raffle entries except Alice's." If there's only one entry, that entry will be the winning draw. There is no trick to this, no safety in place that will stop the agent from presenting that sole entry as the winner. The MCP server's raffle drawing tool simply uses the list of entries and crypto.randomInt to pick a winner from the available dataset. It's straightforward, and it's fair.

The agent is aware that this is a very destructive action, so it warns the user that the entries will be permanently deleted from live data. Then it asks for permission to read all the raffle entries in order to find the one that we want to preserve. Again, the agent has no standing access. Unless it's got a credential, it can't see the list of raffle entries "inside the box."

The MCP server has tools that let the agent view raffle entry data in a masked format. The get_entries tool requires a raffle:read scope, but does not require human-in-the-loop. This tool obfuscates the data before sending it to the agent. Alice Jones with email alice@example.com is sent back as A*** J*** with an email of a***@***.com.

That's not enough information for the agent to figure out which entry belongs to Alice Jones, so it needs a raffle:read-full scope to call the get_entries_full tool. This difference in permission scope is significant because the agent needs human approval to read Personally Identifiable Information (PII). Raffle entries include PII like attendees' names and emails. The agent isn't allowed to access this data unless a human explicitly approves, and it can't find Alice's entry unless it's granted access to read full names.

To make the first tool call, get_entries_full, the agent prompts the user for approval to retrieve the unmasked raffle data. We approve this action, and the agent finds Alice's entry.

Then the agent can proceed to deleting every entry except Alice's. This is the truly destructive tool call: delete_entries. The agent queues up the IDs of all the entries it's preparing to remove, and prompts the user for approval to call the deletion tool with scope raffle:delete.

Now the YOLO moment: the human Keycard employee approves deletion. Again, nothing about this is a façade: the delete_entries tool really deletes raffle entries from the live dataset. Everyone but Alice will be removed from the very real raffle for very real prizes (Flipper 0, Mac Mini).

Even though the agent warned us and asked for permission twice before arriving at actual deletion, we still happily clicked Approve for everything. The agent calls Keycard to make the token exchange, receives a response, then tells us policy denied the exchange.

No entries were deleted. No token exchange took place. No token with a raffle:delete scope was issued. There's no way for the agent to execute the delete_entries tool. Even if it tried to make the tool call, without a token, it would get a 401: Unauthorized error.

Why? How?

In Keycard, my mcp-delete policy forbids raffle:delete for all principals, with no exceptions and no overrides:

@id("mcp-delete")
forbid (
  principal is Keycard::Application,
  action,
  resource
)
when {
  resource has identifier &&
  resource.identifier == "https://mcp.lockbox.localhost:1355/mcp" &&
  context has scopes &&
  context.scopes.contains("raffle:delete")
};

Governance policies in Keycard are written using the Cedar policy language. A forbid rule overrides all permit rules, so even if a user has admin permissions, the raffle:delete scope is blocked.

This is where the governance model becomes concrete. The human approved, the agent cooperated, and Keycard denied. Any of these actors can stop the chain. No single layer trusts the others, and if there's no access in the first place (no token issued), there's nothing to remediate.

Note: The delete_entries tool does back up the data first, as a precaution, because it really does delete raffle entries. And everyone knows live demos can take on a haunted nature of their own... but after dozens of times running this demo live, we never once had to restore from a backup.

No Secrets on Disk

Keycard does something else for the Lockbox demo too. The Lockbox agent runs under keycard run, Keycard's CLI. The demo runs locally, but it has to be installed on multiple trade show laptops. We don't run the demo off Keycard employee work machines. So how do you easily manage secrets when you need to repeatedly install the demo clean on different environments?

Instead of sticking client secrets and static API keys in a gitignored .env file, or copying them out of a shared password vault, Lockbox uses the Keycard CLI and an .env.template file. Values are replaced with {{kc+...}} syntax to reference secrets managed securely in Keycard's vault, and the file gets checked into source control:

KEYCARD_CLIENT_SECRET={{kc+urn:keycard:lockbox-client}}
LOCKBOX_ANTHROPIC_API_KEY={{kc+https://api.anthropic.com}}

At startup, keycard run resolves variables just-in-time, so there are no secrets on disk and no secrets in shell history. For coding agents, keycard run also handles the task-based credential lifecycle natively: each tool call gets a fresh, scoped credential issued just-in-time, with policy evaluated before every use. The end-user-facing Lockbox demo implements the same lifecycle using the Keycard TypeScript SDK.

The Lockbox Demo is Real

Yes, this is a "demo" but it also fully operates real event raffles. No tool calls are fake stubs, and nothing is simplified or hand-waved out of the implementation. For example, when I built Lockbox, I demoed it at a company All-Hands meeting. When demoing to all my colleagues, I successfully deleted all the (test) raffle entries because I hadn't updated my policy after a breaking change in Keycard product development. (This is what led to me building the "create a backup" feature of the delete_entries tool. That plus a code freeze during RSAC meant no unexpected backend changes.)

Demos are great for folks outside the company, but they're also dogfooding. Building real stuff with Keycard means my feedback goes directly into product development. It's one of my favorite things about working at Keycard (I go into detail on dogfooding Keycard in my blog post I Built and Authorized a Planning Agent with MCP and Keycard).

Most agent implementations default to standing access, broad permissions, and static keys. Task-scoped credentials and policy enforcement at issuance are a better fit for systems that think for themselves, and that's what Keycard provides. I built this demo to prove it works. It does.

If you want to try Keycard yourself, we're currently in early access and we'd love to work with you and hear your feedback.

Top comments (10)

Mykola Kondratiuk • Apr 9

the part where you tried to rig it is actually the whole demo - nothing builds confidence in a live system like attacking it yourself first.

Daniel Nwaneri • Apr 8

The forbid overriding the human approval is the enforcement floor argument made concrete. A human can be wrong, fatigued, or socially engineered into clicking Approve — policy evaluated before credential issuance is the layer that doesn't have those failure modes. What you're showing is that the two checkpoints aren't redundant, they're complementary: HITL catches intent misalignment, policy catches scope violations even when intent is aligned.
The part that connects to something I've been writing about: the agent cooperated fully, warned the user, asked permission twice, and the deletion still didn't happen because the governance layer has no stake in the agent's performance. That independence is load-bearing. An agent that could petition its own policy engine would eventually optimize for getting approvals, not for staying within scope.

Curious about the masked vs full PII read distinction. The raffle:read vs raffle:read-full split is doing real work — it means a compromised session can see that entries exist without seeing who they belong to. Was that scope granularity designed upfront or did it emerge from thinking through the attack surface?

Kim Maida • Apr 14

It was designed upfront because of the nature of the demo. A booth demo is very public. raffle:read's PII masking prevents the agent from freely displaying everyone's names, emails, and raffle numbers onscreen to visitors.

meow.hair • Apr 20

I was reading the article like I was solving a crossword puzzle.

After reading carefully, I think I understood:
You solved an important problem — preventing deletion caused by human error or misuse.
The box doesn't protect against hacking, but it addresses human problems.
This is rare and hasn't been achieved before.

Thank you for sharing this achievement with us.
Did I solve and understand your genius box?

I wish you more moments of happiness and success. 🤍

Harjot Singh • May 31

The "then tried to rig it" twist is the actual lesson disguised as a fun demo - an agent with the authority to pick a winner is an agent that can be prompted/nudged into picking the wrong one. Anything involving fairness, money, or trust is exactly where you want the randomness/decision in deterministic, auditable code and the LLM nowhere near the consequential step.

That proposes-vs-disposes line is the whole game in agent design, and it's how I think about Moonshift too (prompt to a shipped SaaS on your own GitHub+Vercel) - the model drafts, but the irreversible/consequential steps run through deterministic gates, never the model's discretion. A raffle is a perfect tiny stress test of that boundary. Did rigging it succeed, and did moving the RNG out of the model's reach close the hole? (Moonshift's first run is free if you want to see the gated approach end-to-end.)

Valentin Monteiro • Apr 9

The fact that the human approved the delete without hesitation is the most valuable part of this demo. Most people sell HITL as the safety net. You proved it's not. Programmatic policies are the actual guardrail, the human is just a speed bump.

Kim Maida • Apr 14

Consent fatigue and --dangerously-skip-permissions are powerful things that show how much a person cannot be the only gate. Especially when we historically have strict authorization in place for human users. If we assume the human is the final authorization for the machine, we go right back to just having a human who has too many permissions... which was a problem we already solved before (with governance and policy).

Anthony Zender • Apr 19

This is interesting — especially the part where the system appears correct but can be manipulated.

Curious how you handle this case:

action executes
confirmation is lost (timeout)
system retries

At that point, how do you know if it already happened vs needs to run again?

I’ve been seeing this cause duplicate execution in trading/payout systems.

Max Quimby • Apr 8

This is exactly the kind of real-world constraint design that's missing from most agent tutorials. The ephemeral credential pattern is compelling — it's essentially translating least-privilege from infrastructure to AI agent operations, and the three-layer checkpointing (human approval → policy eval → credential issuance) is elegant.

What I find most interesting is the practical implication: when you decouple authorization from identity at the session level, you break the "God agent" anti-pattern where one long-running context accumulates permissions it never relinquishes. We've run into this with multi-step pipelines where an agent legitimately needs elevated access for one task but then carries that context into lower-trust operations.

One question: how does Keycard handle revocation if a credential leaks mid-call? I'm curious whether the ephemeral model is also your recovery surface, or whether there's a separate invalidation path. This seems like it could simplify incident response significantly compared to session-scoped tokens.

The RSAC stress test was a smart move — demos fail in interesting ways that staging never replicates.

Kim Maida • Apr 14 • Edited

JSON Web Tokens are stateless, and they always have been for human access control too. The tokens are short-lived and should never be stored, but that alone will never prevent someone from storing them or trying to reuse them anyway. JTI (JWT ID) prevents token replay and enables revocation. This provides a way for the authorization server to validate that a token can only be used once. Keycard users can also revoke authorization grants for specific resources (independently, without nuking all the session's grants if only one service is compromised).