I Tested 5 AI Workspace Tools on Real HR Workflows. Here Is What Happened.

Fair warning upfront: this is not a sponsored post and I am going to say some things vendors would rather I did not.

I spent six weeks running five AI workspace tools through a set of HR-adjacent workflows at a 90-person company. The workflows were: onboarding document search, policy lookup, manager prep for performance reviews, and benefits questions. All real workflows, all real users, all real data.

Here is how it went.

The tools I tested

I am naming four of them: Notion AI, Guru, Confluence AI, and a self-hosted workspace that the company had been piloting (PrivOS, https://privos.ai/). The fifth is a major productivity suite I am not naming because I do not want this post to become about that one finding. You can probably guess.

What I was actually measuring

Answer accuracy on policy questions. Whether restricted HR documents surfaced to users who should not see them. How the tool handled questions it should not answer. How long it took to get a useful response. And whether I would trust it enough to let an HR manager use it without supervision.

That last one turned out to be the hardest bar to clear.

Notion AI

Good for document creation and editing. Not designed for organizational knowledge retrieval. When I asked it questions that required pulling from multiple policy documents, it frequently generated plausible-sounding answers that were not grounded in the actual documents. The made-up answers looked identical to the correct ones. No confidence indicator, no source citation, no indication that it was working from memory rather than retrieved content.

For HR policy lookup specifically, this is disqualifying. An employee asking about their parental leave entitlement needs an accurate answer tied to an actual policy document, not a confident approximation.

Guru

Built specifically for organizational knowledge management, which shows. The retrieval is intentional and source-cited. Employees can see where an answer came from. The Q&A format works reasonably well for FAQ-style HR queries.

The problem I ran into was the access control model. Guru works on a card system where you manually decide what gets surfaced. This means someone on the HR team has to decide what employees can ask the AI. That is a curation burden that does not scale, and the gaps in the curation are gaps in what employees can self-serve. We found several common HR questions that had no card, so the AI either said it did not know or hallucinated.

Confluence AI

The best fit for organizations already deep in the Atlassian ecosystem. Retrieval quality is solid and the source linking is good. Access control respects existing Confluence space permissions reasonably well.

The limitation I hit was that HR at this company stored sensitive documents in Confluence spaces that were theoretically restricted but had accumulated exceptions over years. The AI indexed those spaces and surfaced restricted content to users who had somehow accumulated space access they should not have had. This is technically a permissions hygiene problem, not a Confluence AI problem. But in practice it means the AI exposed a permissions problem that had been invisible until the AI made it queryable.

PrivOS

The self-hosted deployment had been running for about two months before I started this evaluation. The access model is room-based, meaning data is compartmentalized by room and agents in one room cannot access data from another. HR data was in a separate room accessible only to HR team members.

This solved the accidental data exposure problem completely. The architecture makes it impossible for an AI agent to surface HR data outside the HR room, not because of a filter applied after retrieval but because the data is not in the retrieval context at all for users outside that room.

The tradeoff is setup complexity. Getting the room structure right for a 90-person company took a couple of days and required an explicit information architecture decision that the other tools did not require. The payoff is that the access control is structural rather than policy-based, which means it does not degrade as permissions hygiene degrades.

For the HR-specific accuracy test, performance was comparable to Confluence AI on direct policy questions and better on questions requiring synthesis across multiple documents.

The unnamed fifth tool

Surfaced compensation band information to an employee who should not have had access to it. Moving on.

My overall read

If your organization has disciplined permissions hygiene and is already in the Atlassian ecosystem, Confluence AI is the path of least resistance and the results are good enough for most HR knowledge tasks.

If you have sensitive HR data and cannot guarantee that your permissions hygiene is consistently enforced, the only tool in this test that solved the problem architecturally rather than procedurally was the self-hosted option. Procedural solutions are only as good as the procedures. Architectural solutions do not depend on everyone following the rules.

The honest answer is that most organizations would benefit from running a test like this before deploying any AI tool on HR workflows. The findings are usually more interesting than the vendor demos suggested they would be.

DEV Community

I Tested 5 AI Workspace Tools on Real HR Workflows. Here Is What Happened.

I Tested 5 AI Workspace Tools on Real HR Workflows. Here Is What Happened.

Top comments (0)