Most AI interfaces hide the work. You type something, a spinner appears, you get an answer. That works for chat. For accounts payable, it's wrong.
When Finley — an invoice intelligence agent I helped build — processes an invoice, the user needs to trust the output enough to act on it. Approve a ₹47,500 payment. Reject a suspected duplicate. Override an agent decision with their own judgment. That kind of trust doesn't come from a fast spinner and a confident-sounding verdict. It comes from seeing the work.
Here's how we designed the UI around that principle, and what surprised us about where the UX had the most impact.
The Pipeline Stepper Is the Core UI Primitive
Every invoice in Finley goes through six named stages:
Invoice input → LLM extraction → Memory retrieval → Analysis → Decision → Feedback & learning
These stages are always visible as a horizontal stepper at the top of the interface. As the backend processes the invoice, the active stage advances. Each stage has a label that names the work being done.
This wasn't a cosmetic choice. The stepper does three things:
It sets time expectations. When the spinner just spins, users don't know if it'll be done in 1 second or 10. When a stage label says "Querying Hindsight memory for vendor history...", users understand the agent is doing something specific, not just being slow.
It makes the pipeline legible. Showing users that there's a "Memory retrieval" step, separate from "Analysis," creates a mental model: the agent knows things it learned before. That's crucial for getting users to trust memory-backed decisions.
It surfaces failures at the right stage. When something breaks, the stepper shows where — "Analysis failed" vs. "Memory retrieval failed" tells the user (and us) immediately what to look at.
The Loading Labels Matter More Than the Animation
The spinner itself is unremarkable. What carries the weight is the label text:
function getLoadingLabel(stage) {
return {
extract: "Extracting invoice data with LLM...",
memory: "Querying Hindsight memory for vendor history...",
analyze: "Running contextual analysis...",
decision: "Building decision...",
}[stage] || "Processing...";
}
"Querying Hindsight memory for vendor history" is meaningful in a way "Processing..." isn't. It tells the user what agent memory actually does in one sentence. Every time the app says that during a demo, the question comes: "Wait, it actually remembers past invoices?" Yes. That's the point.
*The Result Panel Has Three Jobs
*
After processing, Finley shows the result across a structured panel. The design had to balance three things:
Show what was extracted. The vendor name, invoice number, amount, payment terms, line items. Users need to verify the LLM got it right.
Show why the decision was made. The flags array, the confidence score, the specific patterns the agent detected. If it's flagging as a duplicate, it should say which previous invoice it matched and how confident it is.
Show the memory. How many past interactions the agent recalled. What patterns it detected from that history. This is where Hindsight makes itself visible — not as an implementation detail, but as part of the explanation.
The memory display was the hardest part to get right. "9 memory recalls" is meaningless unless you can see what those memories contain and why they influenced the decision. We ended up with a compact summary of recalled interactions alongside the decision rationale.
The Feedback Panel Closes the Loop
After reviewing the result, users take one of three actions: approve, reject, or override. The feedback panel captures their choice and an optional correction note.
This isn't just UX polish — it's the mechanism by which the agent learns. When a user says "rejected — confirmed duplicate," that action gets stored in Hindsight as a memory entry for that vendor. The next invoice from the same vendor retrieves this entry and the agent has concrete evidence of a prior duplicate pattern.
The success state after feedback makes this explicit:
"Your feedback has been stored in Hindsight. The agent will use this knowledge when processing the next invoice from Prakash Office Supplies."
That sentence completes the loop for users. It's not just a confirmation toast. It's an explanation of how the system improves.
The Vendor Memory Sidebar
The sidebar panel shows full vendor history — every past invoice interaction, what the agent decided, what the user did. This was added specifically to make the memory layer tangible.
During demos, opening this panel for a vendor with 9 prior interactions is usually the moment people understand what the system is actually doing. Before that, "agent memory" is abstract. Seeing a timeline of invoice decisions, corrections, and accumulated patterns makes it concrete.
We framed it this way in the demo script: "This is institutional knowledge that used to live in a senior accountant's head. Now it's in the agent."
What We'd Do Differently
The pipeline stepper animates between stages with fixed delays (300ms, 600ms, 900ms). That's a placeholder — real stage transitions should fire as the backend actually completes each step. We'd wire that up as server-sent events or WebSocket messages for a production build.
The result panel doesn't show confidence intervals or uncertainty on extracted fields. If the LLM is 60% confident about a payment term extraction, users should see that — especially since wrong payment terms are a meaningful failure mode.
The feedback input box for correction notes is freeform. In production, structured categories (duplicate, wrong amount, wrong vendor, etc.) would produce better-structured memory entries downstream.
The Lesson
Visible AI isn't about showing off the technical pipeline. It's about giving users enough information to make an informed decision about whether to trust the output. In accounts payable, trust is the product. The interface is how you earn it.
Finley is live at finley-rho.vercel.app.
Top comments (0)