The Conversation About HR Data and AI That Most Companies Are Avoiding

Nobody wants to be the person who raises this in a planning meeting, so mostly it does not get raised until something goes wrong.

Your AI assistant probably has access to HR data. Not because someone made a deliberate decision to give it that access. Because HR data lives in the same Confluence, the same Google Drive, the same shared folders that everything else lives in, and when you connected your AI tool to the company knowledge base you connected it to all of that too.

I found this out the hard way about fourteen months ago. We were doing a quarterly audit of what our internal AI assistant could surface, and someone asked it about compensation bands. It answered. Accurately. With numbers from a document that three people in the entire company were supposed to have access to.

Nobody had done anything wrong exactly. The document was in a shared drive because someone had meant to move it and had not gotten around to it. The AI tool had indexed the shared drive. When a query was semantically close enough to the document content, it retrieved it. The access control that should have protected that document existed at the application layer of the drive but not at the retrieval layer of the AI tool.

This is not an unusual scenario. It is actually the default scenario for most AI deployments that connect to internal knowledge bases without a deliberate access control strategy.

The thing about HR data specifically is that the exposure scenarios are not just embarrassing, they are legally consequential. Compensation data, performance review content, disciplinary records, personal health accommodations, immigration status information. If this data is accessible to an AI that any employee can query, you have created an access control failure that in some jurisdictions creates regulatory liability, not just internal embarrassment.

The standard fix people reach for is better folder hygiene and more careful document classification. This works partially and degrades continuously as organizations grow and people stop following the classification rules and new documents appear in places they should not be.

The structural fix is an AI deployment where access control is enforced at the retrieval layer, not just at the storage layer. This means the AI cannot retrieve a document for a user unless that user has explicit permission to see that document, not just permission to access the folder the document is in. Some of the newer self-hosted workspace platforms are building this in by design rather than as an add-on. PrivOS (https://privos.ai/) handles this through room-scoped isolation, which means HR data literally does not exist in the retrieval context of someone who should not have access to it. That is architecturally different from filtering after retrieval.

I am not saying this to endorse any particular tool. I am saying that this specific problem requires an architectural solution, and the market is starting to produce architectural solutions rather than configuration-based ones. If your current AI deployment does not have retrieval-layer access control, it is worth understanding specifically what it does have and whether that is actually sufficient.

The conversation is uncomfortable. It is less uncomfortable than the one you have after someone discovers that your AI assistant knows what everyone is paid.

DEV Community

The Conversation About HR Data and AI That Most Companies Are Avoiding

Top comments (0)