This tutorial was written by Han Heloir.
At Hogwarts School of Witchcraft and Wizardry, the library is a vast repository of magical knowledge. Among its countless shelves lies the Restricted Section, a place where only students with special permissions may tread. Imagine if every student, no matter their year, house, or purpose, could access any scroll, spell, or prophecy. Chaos would ensue.
As AI assistants become more capable, we are in a similar situation. Retrieval-augmented generation (RAG) systems let us ask questions and get context-aware answers powered by our internal documents. But without proper safeguards, these systems can expose sensitive data to unauthorized users.
This is the tale of how we built a secure RAG system—one that guards access as vigilantly as Madam Pince—with the help of MongoDB, Permit.io, and LangChain.
The enchanted infrastructure
To mirror the protective enchantments of Hogwarts, our magical trio consists of:
- MongoDB Atlas—our library of scrolls, where all knowledge is stored as documents and vector embeddings.
- Permit.io—our permission charm system, ensuring only authorized witches and wizards access the right content.
- LangChain—our spellcaster, orchestrating the retrieval and generation of safe, contextual answers.
Why MongoDB Atlas is the ideal choice for secure RAG
In our magical analogy, MongoDB Atlas serves as the enchanted library of Hogwarts, housing all the scrolls and tomes of knowledge. Here’s why it’s the perfect fit:
1. Unified data platform
MongoDB Atlas allows you to store both your operational data and vector embeddings in a single, unified platform. This eliminates the need for separate systems, reducing complexity and ensuring data consistency.
2. Native vector search capabilities
With Atlas Vector Search, you can perform semantic searches directly within MongoDB. This means you can retrieve documents based on their meaning, not just exact keyword matches, enhancing the relevance of your AI-generated responses.
3. Scalability and performance
MongoDB’s distributed architecture ensures that your RAG applications can scale seamlessly to handle large volumes of data and high query loads, maintaining low latency and high throughput.
4. Flexible schema design
MongoDB’s flexible document model allows you to store complex and varied data structures, accommodating the diverse types of information used in RAG systems.
5. Integration with AI frameworks
MongoDB Atlas integrates smoothly with AI frameworks like LangChain, enabling you to build sophisticated RAG pipelines that combine data retrieval with language generation.
By leveraging MongoDB Atlas, our secure RAG system benefits from a robust, scalable, and flexible data foundation, ensuring that our AI assistants, much like the students of Hogwarts, access only the knowledge they’re permitted to, maintaining the sanctity and security of our magical library.
Setting up the library: MongoDB Atlas
We start by creating a vector-enabled collection:
- Database: secure_rag database
- Collection: libraryDocuments
- Vector index:
{
"fields": [
{
"type": "vector",
"path": "vector_embedding",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "metadata.department"
},
{
"type": "filter",
"path": "document_id"
}
]
}
This configuration allows MongoDB to return documents semantically close to a query while pre-filtering by department and document ID.
Casting permission spells: Permit.io
Permit.io brings fine-grained control to our magical library:
- Resources: department, document
- Roles: viewer, admin, etc.
- Users: Assigned to roles in departments (e.g., alice is a viewer in engineering)
- ReBAC policy: If a user is a viewer in department X, and a document belongs to that department, they may read it.
Role derivations are defined as:
“If a user has role viewer in department X, they may read any document where department X is the parent.”
The spellcaster: LangChain integration
LangChain connects all the dots:
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(),
retriever=vector_store.as_retriever()
)
# Permission check before response
if permit.check_permission(user="hermione", action="read", resource="document:magic_laws"):
response = qa_chain.run("What are the magical laws regarding underage spells?")
else:
response = "Access Denied: You do not have permission to access this knowledge."
Scenarios from the wizarding world
1. The curious intern: Ginny’s summer project
Scenario: Ginny, an intern in the Department of Mysteries, attempts to access documents restricted to the Auror department.
Permit.io implementation:
- Role assignment:
roles:
- name: intern
permissions:
- action: read
resource: document
condition:
department: "mysteries"
- Resource definition:
resources:
- name: document
attributes:
- name: department
type: string
- name: confidentiality
type: string
- Policy enforcement:
package permit.policy
default allow = false
allow {
input.user.role == "intern"
input.resource.department == "mysteries"
input.resource.confidentiality != "high"
}
This configuration ensures that interns can only access documents within their department and not those marked as high confidentiality.
2. Cross-house collaboration: Luna and Neville’s herbology bot
Scenario: Neville, a student, queries restricted Healing Studies documents.
Permit.io implementation:
- Role assignment:
roles:
- name: student
permissions:
- action: read
resource: document
condition:
department: "herbology"
- Policy enforcement:
package permit.policy
default allow = false
allow {
input.user.role == "student"
input.resource.department == "herbology"
input.resource.restricted != true
}
Students are restricted from accessing documents marked as restricted, even within their department.
3. A policy update from the Headmistress
Scenario: McGonagall grants Prefects access to student disciplinary archives.
Permit.io implementation:
- Role assignment:
roles:
- name: prefect
permissions:
- action: read
resource: document
condition:
category: "disciplinary"
- Dynamic policy update:
Using Permit.io’s UI or API, the new role and permissions can be added without system downtime, and changes take effect immediately.
This dynamic update allows Prefects to access disciplinary documents as per the new policy.
4. Dumbledore’s army planning
Scenario: Harry, Ron, and Hermione collaborate on a defense strategy, each with different document access rights.
Permit.io implementation:
- Role assignments:
roles:
- name: gryffindor_member
permissions:
- action: read
resource: document
condition:
department: "defense"
- name: ravenclaw_member
permissions:
- action: read
resource: document
condition:
department: "strategy"
- Policy enforcement:
package permit.policy
default allow = false
allow {
input.user.role == "gryffindor_member"
input.resource.department == "defense"
}
allow {
input.user.role == "ravenclaw_member"
input.resource.department == "strategy"
}
Each member can only access documents pertaining to their assigned departments, ensuring secure collaboration.
5. The forbidden inquiry: Draco’s attack
Scenario: Draco attempts to access confidential Order of the Phoenix documents.
Permit.io implementation:
- Policy enforcement:
package permit.policy
default allow = false
allow {
input.user.role == "order_member"
input.resource.confidentiality != "high"
}
Since Draco is not assigned the ‘order_member’ role, access is denied.
6. The forgotten scroll: Hagrid adds a document
Scenario: Hagrid adds a new document on magical creatures.
Permit.io implementation:
- Resource registration:
resources:
- name: document
attributes:
- name: department
type: string
- name: author
type: string
- Role assignment:
roles:
- name: creatures_professor
permissions:
- action: read
resource: document
condition:
department: "creatures"
Hagrid, as a ‘creatures_professor’, automatically gains access to documents in the ‘creatures’ department.
The flow of magic
A user sends a question to the API:
- LangChain checks user_id and queries Permit.io to determine accessible resources.
- Permit.io returns a list of permitted document IDs.
- LangChain performs a semantic vector search in MongoDB Atlas, applying filters based on the permitted document IDs.
- Only authorized documents are retrieved and passed to the LLM for response generation.
- The answer is crafted solely from the content the user is permitted to access.
No unauthorized spells are leaked. No dark magic slips through.
Conclusion: Magic, secured by design
RAG is powerful—but just like magic, it must be used responsibly. By combining:
- MongoDB Atlas for efficient, semantic document retrieval
- Permit.io for precise, dynamic access control
- LangChain for orchestrating natural language understanding
…we’ve built a system that ensures AI is not only smart but also safe.
Just like Hogwarts’ Restricted Section, this system guards knowledge wisely, allowing each witch and wizard to unlock only the scrolls meant for them.
Top comments (0)