Introduction
Enterprise data lives on file servers. And on those file servers, not everyone can see everything — NTFS ACLs, UNIX permissions, and group policies control who accesses what. But when you plug that data into a Retrieval-Augmented Generation (RAG) system, those permission boundaries tend to disappear. Suddenly, anyone can ask the AI about another team's, division's, or board member's confidential information.
But there's a flip side to this problem that's equally important: without permission awareness, the AI can't fully help the people it should be helping.
Think about it. An engineer has years of design docs, project specs, and team-internal notes in their department's shared folder. A sales lead has pipeline data, customer contracts, and regional forecasts in theirs. When you strip away permissions and dump everything into one vector store, the AI doesn't just leak confidential data — it also drowns each user's results in irrelevant noise from every other team. The engineer gets sales forecasts mixed into their search results. The sales lead gets CI/CD pipeline docs they'll never need.
Permission-aware/Access-aware RAG flips this around. Because the system knows exactly which files each user can access, it delivers personalized, noise-free AI assistance grounded in the data each person actually works with day to day. Your personal folder, your team's shared drive, the cross-functional project space you're part of — the AI sees what you see, nothing more, nothing less.
I built Agentic Access-Aware RAG to make this real. It's an open-source system that lets AI agents autonomously search, analyze, and respond to enterprise data stored on Amazon FSx for NetApp ONTAP — while respecting per-user file-level access permissions. The same question yields different answers depending on who's asking: an admin gets the full financial report, a project member gets their project's restricted docs, and a general user gets public information only. Each user gets an AI assistant that's effectively customized to their role and responsibilities — without any manual configuration.
The entire stack deploys with a single npx cdk deploy --all command.
👉 GitHub: Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG
Architecture at a Glance
Browser → AWS WAF → CloudFront (OAC+Geo) → Lambda Web Adapter (Next.js 15)
│
┌─────────────┬───────────────────────┼──────────────────┐
▼ ▼ ▼ ▼
Cognito Bedrock KB DynamoDB DynamoDB
User Pool + S3 Vectors / user-access perm-cache
OpenSearch SL (SID Data) (Perm Cache)
│
▼
FSx for ONTAP
(SVM + Volume)
+ S3 Access Point
The system is organized into 7 CDK stacks: WAF, Networking, Security (Cognito), Storage (FSx for ONTAP + DynamoDB), AI (Bedrock KB + vector store), WebApp (Lambda + CloudFront), and an optional Embedding stack.
The Core Idea: Permission-aware/Access-aware RAG
Traditional RAG retrieves documents based on semantic similarity alone. This system adds a second dimension: SID-based permission filtering.
Here's the flow:
- User sends a question via the chat UI
- The app retrieves the user's SID list (personal SID + group SIDs) from DynamoDB
- Bedrock KB Retrieve API performs vector search — each result carries
allowed_group_sidsmetadata - The app matches each document's SIDs against the user's SIDs
- Only permitted documents are passed to the Converse API for answer generation
- The user sees a filtered response with citation badges showing access levels
■ Admin user: SIDs = [...-512 (Domain Admins), S-1-1-0 (Everyone)]
public/ → S-1-1-0 match → ✅ Permitted
confidential/ → ...-512 match → ✅ Permitted
engineering/ → No match → ❌ Filtered out (no noise from other teams)
■ Engineer (Engineering group member): SIDs = [...-1100 (Engineering), S-1-1-0 (Everyone)]
public/ → S-1-1-0 match → ✅ Permitted
confidential/ → No match → ❌ Denied
engineering/ → ...-1100 match → ✅ Their team's docs, front and center
■ Sales user: SIDs = [...-1200 (Sales), S-1-1-0 (Everyone)]
public/ → S-1-1-0 match → ✅ Permitted
confidential/ → No match → ❌ Denied
engineering/ → No match → ❌ No engineering noise in their results
The engineer asking "What's the status of Project X?" gets answers from their team's internal docs — not from sales forecasts or HR policies. The sales lead asking "What are our Q3 targets?" gets their regional data without wading through engineering specs. Each user's AI experience is naturally scoped to the data they work with every day.
S3 Access Points: The Bridge Between FSx for ONTAP and Bedrock KB
One of the most impactful recent additions is S3 Access Point integration with FSx for ONTAP. This creates a clean, single-path data ingestion architecture:
FSx for ONTAP Volume (/data)
├── public/company-overview.md
├── public/company-overview.md.metadata.json
├── confidential/financial-report.md
├── confidential/financial-report.md.metadata.json
│
│ S3 Access Point
▼
Bedrock KB Data Source (S3 AP alias)
│ Ingestion Job (chunking + Titan Embed v2)
▼
Vector Store (S3 Vectors or OpenSearch Serverless)
Before S3 Access Points, getting data from FSx for ONTAP into Bedrock KB required either a custom Embedding server with CIFS mounts or manual S3 uploads. Now, Bedrock KB reads documents directly from the FSx for ONTAP volume through the S3 Access Point — no intermediate copies, no sync scripts.
The S3 AP user type is automatically selected based on your AD configuration:
| AD Configuration | Volume Style | S3 AP User Type | Behavior |
|---|---|---|---|
| AD configured | NTFS | WINDOWS (Admin) |
NTFS ACLs automatically applied |
| No AD | NTFS/UNIX | UNIX (root) |
All files accessible; permission control via .metadata.json
|
One gotcha I discovered: the S3 AP WindowsUser must not include the domain prefix. DEMO\Admin works for CLI operations but causes AccessDenied on data plane APIs (ListObjects, GetObject). Always specify just Admin.
S3 Vectors: Low-Cost Vector Storage
The default vector store is Amazon S3 Vectors — a relatively new service that brings vector search costs down to a few dollars per month, compared to ~$700/month for OpenSearch Serverless.
| Configuration | Cost | Latency | Best For |
|---|---|---|---|
| S3 Vectors (default) | ~$2-5/month | Sub-second to 100ms | Demo, dev, cost optimization |
| OpenSearch Serverless | ~$700/month | ~10ms | High-performance production |
S3 Vectors does have a 2KB filterable metadata limit per vector. Since Bedrock KB's internal metadata already consumes ~1KB, custom metadata is effectively limited to ~1KB. The system handles this by setting all metadata keys (including allowed_group_sids) as non-filterable and performing SID matching on the application side after retrieval.
If you start with S3 Vectors and later need higher performance, you can export on-demand to OpenSearch Serverless using the included export-to-opensearch.sh script.
Embedding Design: .metadata.json and the Ingestion Pipeline
Permission metadata follows the standard Bedrock KB metadata file specification. Each document has a companion .metadata.json file:
product-catalog.md ← Document body
product-catalog.md.metadata.json ← Permission metadata
The metadata format:
{
"metadataAttributes": {
"allowed_group_sids": "[\"S-1-1-0\"]",
"access_level": "public",
"doc_type": "catalog"
}
}
The allowed_group_sids field is a JSON array string of Windows SIDs that are allowed to access the document. S-1-1-0 is the well-known "Everyone" SID.
Bedrock KB Ingestion Jobs automatically read these .metadata.json files alongside documents, chunk the content, vectorize with Amazon Titan Text Embeddings v2 (1024 dimensions), and store everything in the vector store. No custom ETL pipeline needed.
Design Decisions and Trade-offs
At scale (thousands of documents), managing individual .metadata.json files becomes a maintenance burden. The system supports three approaches:
| Approach | Status | Pros | Cons |
|---|---|---|---|
.metadata.json (current default) |
✅ Production | Bedrock KB native, no extra infra | Doubles file count, manual management |
| ONTAP REST API auto-generation | ✅ Partially implemented | File server ACLs as source of truth | Requires Embedding server |
| DynamoDB permission master | 🔜 Recommended for scale | DB-driven, easy auditing | Requires pre-Ingestion generation pipeline |
The recommended direction for large-scale environments:
ONTAP REST API (ACL retrieval)
→ DynamoDB document-permissions table
→ Auto-generate .metadata.json before Ingestion Job
→ Ingest via S3 AP into Bedrock KB
Multiple Authentication Modes
The system supports 5 authentication configurations, all driven by cdk.context.json parameters:
| Mode | Authentication | Permission Source | Configuration |
|---|---|---|---|
| A: Email/Password | Cognito native | Manual DynamoDB SID registration | Default (no extra config) |
| B: SAML AD Federation | Cognito + SAML IdP | AD Sync Lambda → auto SID retrieval | enableAdFederation=true |
| C: OIDC + LDAP | Cognito + OIDC IdP | LDAP query → auto UID/GID retrieval |
oidcProviderConfig + ldapConfig
|
| D: OIDC Claims Only | Cognito + OIDC IdP | OIDC token claims → group mapping |
oidcProviderConfig + groupClaimName
|
| E: SAML + OIDC Hybrid | Both IdPs simultaneously | Combined SID + UID/GID | Both configs + permissionMappingStrategy=hybrid
|
The OIDC/LDAP federation (added in v3.4.0) enables zero-touch user provisioning: when a user signs in via the OIDC IdP for the first time, the Identity Sync Lambda automatically queries LDAP for their UID/GID/groups and stores them in DynamoDB. No admin intervention required.
For environments with FSx for ONTAP UNIX volumes, the system also supports ONTAP name-mapping — automatically resolving UNIX usernames to Windows users via the ONTAP REST API.
Agentic AI: Beyond Document Search
The system isn't just a search engine. Toggle between two modes with one click:
- KB Mode: Permission-aware/Access-aware document search and Q&A
- Agent Mode: Permission-aware/Access-aware autonomous multi-step reasoning and task execution via Bedrock Agents
Agent mode includes an Agent Directory — a catalog-style management screen where you can create, edit, share, and schedule Bedrock Agents from templates. 14 workflow cards cover research tasks (market analysis, competitive research, etc.) and output tasks (presentations, approval documents, meeting minutes).
Permission filtering works in both modes. Even when an Agent autonomously searches and reasons across multiple documents, only documents the user is authorized to see are included.
AgentCore Memory (v3.3.0)
With enableAgentCoreMemory=true, the system integrates Amazon Bedrock AgentCore Memory for conversation context maintenance:
- Short-term memory: In-session conversation history (TTL: 3 days)
- Long-term memory: Cross-session user preferences and summaries (semantic + summary strategies)
Additional Features
Smart Routing (v3.1.0)
Automatic model selection based on query complexity. Short factual queries route to Claude Haiku (fast, cheap); complex analytical queries route to Claude Sonnet (powerful). Toggle ON/OFF in the sidebar.
Image Analysis RAG (v3.1.0)
Drag-and-drop image upload in the chat input. Images are analyzed with Bedrock Vision API (Claude Haiku 4.5) and the analysis is integrated into KB search context.
6-Layer Security
| Layer | Technology | Purpose |
|---|---|---|
| L1 | CloudFront Geo Restriction | Geographic access control |
| L2 | AWS WAF (6 rules) | Attack pattern detection |
| L3 | CloudFront OAC (SigV4) | Origin authentication |
| L4 | Lambda Function URL IAM Auth | API-level access control |
| L5 | Cognito JWT / SAML / OIDC | User authentication |
| L6 | SID / UID+GID Filtering | Document-level authorization |
8-Language i18n — Why It Matters
The UI and all documentation (README, guides, setup instructions) are available in 8 languages: Japanese, English, Korean, Simplified Chinese, Traditional Chinese, French, German, and Spanish.
This isn't just a nice-to-have. Enterprise file servers are inherently multi-regional — a global company's FSx for ONTAP volumes serve teams across Tokyo, Seoul, Shanghai, Frankfurt, and New York. If the RAG interface only speaks English, you've created a barrier for the very users who need it most. Non-English-speaking knowledge workers shouldn't need to context-switch languages just to search their own documents.
From a Community Builder perspective, localization also lowers the barrier to adoption. When a solutions architect in São Paulo or a storage admin in Taipei can read the deployment guide in their own language, they're far more likely to actually try it, fork it, and adapt it to their environment. Open-source projects that only document in one language inadvertently limit their community to one language group.
The implementation uses Next.js next-intl with per-locale message files (src/messages/{locale}.json). Every UI string — from card labels to sign-in buttons to error messages — goes through useTranslations(). The sign-in page even detects the browser's preferred language and auto-redirects to the matching locale.
Localization doesn't stop at the UI chrome. The AI's chat responses also match the user's language. The system prompt instructs the model to "respond in the same language as the question" — so a Korean user asking in Korean gets a Korean answer with Korean citation labels ("참조 문서", "전체 접근 가능"), and a German user gets "Referenzierte Dokumente" and "Allgemein zugänglich". This end-to-end language consistency — from sign-in screen to card labels to AI-generated answers to citation metadata — means users never hit a jarring language switch mid-workflow.
Here's what the card grid and sign-in screens look like across all 8 languages:
| 🇯🇵 日本語 | 🇺🇸 English | 🇰🇷 한국어 | 🇨🇳 简体中文 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| 🇹🇼 繁體中文 | 🇫🇷 Français | 🇩🇪 Deutsch | 🇪🇸 Español |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Sign-in pages are also fully localized:
| 🇯🇵 | 🇺🇸 | 🇫🇷 | 🇩🇪 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Tips for Builders
A few things I learned the hard way that might save you time.
OpenLDAP memberOf Overlay
If you're testing with OpenLDAP, the LDAP Connector reads the memberOf attribute from user entries. Basic OpenLDAP doesn't populate this automatically — you need to add moduleload memberof and overlay memberof to slapd.conf, and create groupOfNames entries (not just posixGroup). posixGroup and groupOfNames are different structural classes and can't coexist in the same entry — use a separate OU.
The repo includes setup-openldap.sh that handles all of this automatically.
Geo Restriction Default
The WAF configuration defaults to Japan-only access (allowedCountries: ["JP"]). If you're deploying outside Japan, update this before deploying:
{ "allowedCountries": ["JP", "US", "DE", "SG"] }
Set to [] for worldwide access.
Multiple Volumes, One Deployment
If your FSx file system has multiple volumes, specify one as the primary during CDK deployment. Additional volumes can be added as Bedrock KB data sources after deployment — each gets its own S3 Access Point and can be independently synced.
Built with Kiro
I used Kiro throughout the entire development lifecycle — specs for requirements-to-code traceability, hooks for automated validation on file saves, and steering files for project-specific rules that persist across sessions. The 8-language documentation, 130+ unit tests, 52 property-based tests, and the LDAP/ONTAP live environment verification were all developed with Kiro's assistance. As a solo developer, this level of tooling makes enterprise-quality projects feasible.
Getting Started
git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG.git
cd FSx-for-ONTAP-Agentic-Access-Aware-RAG && npm install
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/ap-northeast-1
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/us-east-1
bash demo-data/scripts/pre-deploy-setup.sh
npx cdk deploy --all --require-approval never
bash demo-data/scripts/post-deploy-setup.sh
Prerequisites: Node.js 22+, Docker, AWS CLI configured with AdministratorAccess. Total deployment time is about 30-40 minutes (FSx for ONTAP creation takes 20-30 minutes).
The post-deploy-setup.sh script handles everything after CDK deployment: S3 Access Point creation, demo data upload, Bedrock KB data source registration + sync, DynamoDB SID data, and Cognito demo users.
What's Next
The project is at v3.4.0 and actively evolving. Some directions I'm exploring:
-
DynamoDB-driven permission master for large-scale environments (eliminating per-file
.metadata.jsonmanagement) - Multi-volume embedding across multiple FSx for ONTAP volumes with independent S3 Access Points
- Bedrock KB Custom Data Source integration as an alternative to S3 AP
I'm looking for feedback on:
- Permission models: Are SID/UID-GID/hybrid strategies sufficient for your use cases?
- Authentication patterns: What IdP combinations do you need?
- Document types: Beyond markdown, what formats need Permission-aware/Access-aware handling?
If you try it out, I'd love to hear about your experience — especially edge cases I haven't considered. PRs and issues are welcome.
👉 GitHub Repository and README.md you can switch language from 8 as well as Applicatiton UI
Yoshiki Fujiwara

















Top comments (0)