๐ Updated April 2026: v4.0.0 released with 6 new features โ Agent Registry, Multimodal RAG, Guardrails, Episodic Memory, Voice Chat, and AgentCore Policy. See what's new.
Introduction
Enterprise data lives on file servers. And on those file servers, not everyone can see everything โ NTFS ACLs, UNIX permissions, and group policies control who accesses what. But when you plug that data into a Retrieval-Augmented Generation (RAG) system, those permission boundaries tend to disappear. Suddenly, anyone can ask the AI about another team's, division's, or board member's confidential information.
But there's a flip side to this problem that's equally important: without permission awareness, the AI can't fully help the people it should be helping.
Think about it. An engineer has years of design docs, project specs, and team-internal notes in their department's shared folder. A sales lead has pipeline data, customer contracts, and regional forecasts in theirs. When you strip away permissions and dump everything into one vector store, the AI doesn't just leak confidential data โ it also drowns each user's results in irrelevant noise from every other team. The engineer gets sales forecasts mixed into their search results. The sales lead gets CI/CD pipeline docs they'll never need.
Permission-aware RAG flips this around. Because the system knows exactly which files each user can access, it delivers personalized, noise-free AI assistance grounded in the data each person actually works with day to day. Your personal folder, your team's shared drive, the cross-functional project space you're part of โ the AI sees what you see, nothing more, nothing less.
I built Agentic Access-Aware RAG to make this real. It's an open-source system that lets AI agents autonomously search, analyze, and respond to enterprise data stored on Amazon FSx for NetApp ONTAP โ while respecting per-user file-level access permissions. The same question yields different answers depending on who's asking: an admin gets the full financial report, a project member gets their project's restricted docs, and a general user gets public information only. Each user gets an AI assistant that's effectively customized to their role and responsibilities โ without any manual configuration.
The entire stack deploys with a single npx cdk deploy --all command.
๐ GitHub: Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG
๐ฆ Latest Release: v4.0.0 โ 6 new features added
Architecture at a Glance
Browser โ AWS WAF โ CloudFront (OAC+Geo) โ Lambda Web Adapter (Next.js 15)
โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ โผ
Cognito Bedrock KB DynamoDB DynamoDB
User Pool + S3 Vectors / user-access perm-cache
OpenSearch SL (SID Data) (Perm Cache)
โ
โผ
FSx for ONTAP
(SVM + Volume)
+ S3 Access Point
The system is organized into 7 CDK stacks: WAF, Networking, Security (Cognito), Storage (FSx ONTAP + DynamoDB), AI (Bedrock KB + vector store), WebApp (Lambda + CloudFront), and an optional Embedding stack.
The Core Idea: Permission-Aware RAG
Traditional RAG retrieves documents based on semantic similarity alone. This system adds a second dimension: SID-based permission filtering.
Here's the flow:
- User sends a question via the chat UI
- The app retrieves the user's SID list (personal SID + group SIDs) from DynamoDB
- Bedrock KB Retrieve API performs vector search โ each result carries
allowed_group_sidsmetadata - The app matches each document's SIDs against the user's SIDs
- Only permitted documents are passed to the Converse API for answer generation
- The user sees a filtered response with citation badges showing access levels
โ Admin user: SIDs = [...-512 (Domain Admins), S-1-1-0 (Everyone)]
public/ โ S-1-1-0 match โ โ
Permitted
confidential/ โ ...-512 match โ โ
Permitted
engineering/ โ No match โ โ Filtered out (no noise from other teams)
โ Engineer (Engineering group member): SIDs = [...-1100 (Engineering), S-1-1-0 (Everyone)]
public/ โ S-1-1-0 match โ โ
Permitted
confidential/ โ No match โ โ Denied
engineering/ โ ...-1100 match โ โ
Their team's docs, front and center
โ Sales user: SIDs = [...-1200 (Sales), S-1-1-0 (Everyone)]
public/ โ S-1-1-0 match โ โ
Permitted
confidential/ โ No match โ โ Denied
engineering/ โ No match โ โ No engineering noise in their results
The engineer asking "What's the status of Project X?" gets answers from their team's internal docs โ not from sales forecasts or HR policies. The sales lead asking "What are our Q3 targets?" gets their regional data without wading through engineering specs. Each user's AI experience is naturally scoped to the data they work with every day.
S3 Access Points: The Bridge Between FSx ONTAP and Bedrock KB
One of the most impactful recent additions is S3 Access Point integration with FSx for ONTAP. This creates a clean, single-path data ingestion architecture:
FSx ONTAP Volume (/data)
โโโ public/company-overview.md
โโโ public/company-overview.md.metadata.json
โโโ confidential/financial-report.md
โโโ confidential/financial-report.md.metadata.json
โ
โ S3 Access Point
โผ
Bedrock KB Data Source (S3 AP alias)
โ Ingestion Job (chunking + Titan Embed v2)
โผ
Vector Store (S3 Vectors or OpenSearch Serverless)
Before S3 Access Points, getting data from FSx ONTAP into Bedrock KB required either a custom Embedding server with CIFS mounts or manual S3 uploads. Now, Bedrock KB reads documents directly from the FSx ONTAP volume through the S3 Access Point โ no intermediate copies, no sync scripts.
The S3 AP user type is automatically selected based on your AD configuration:
| AD Configuration | Volume Style | S3 AP User Type | Behavior |
|---|---|---|---|
| AD configured | NTFS | WINDOWS (Admin) |
NTFS ACLs automatically applied |
| No AD | NTFS/UNIX | UNIX (root) |
All files accessible; permission control via .metadata.json
|
One gotcha I discovered: the S3 AP WindowsUser must not include the domain prefix. DEMO\Admin works for CLI operations but causes AccessDenied on data plane APIs (ListObjects, GetObject). Always specify just Admin.
S3 Vectors: Low-Cost Vector Storage
The default vector store is Amazon S3 Vectors โ a relatively new service that brings vector search costs down to a few dollars per month, compared to ~$700/month for OpenSearch Serverless.
| Configuration | Cost | Latency | Best For |
|---|---|---|---|
| S3 Vectors (default) | ~$2-5/month | Sub-second to 100ms | Demo, dev, cost optimization |
| OpenSearch Serverless | ~$700/month | ~10ms | High-performance production |
S3 Vectors does have a 2KB filterable metadata limit per vector. Since Bedrock KB's internal metadata already consumes ~1KB, custom metadata is effectively limited to ~1KB. The system handles this by setting all metadata keys (including allowed_group_sids) as non-filterable and performing SID matching on the application side after retrieval.
If you start with S3 Vectors and later need higher performance, you can export on-demand to OpenSearch Serverless using the included export-to-opensearch.sh script.
Embedding Design: .metadata.json and the Ingestion Pipeline
Permission metadata follows the standard Bedrock KB metadata file specification. Each document has a companion .metadata.json file:
product-catalog.md โ Document body
product-catalog.md.metadata.json โ Permission metadata
The metadata format:
{
"metadataAttributes": {
"allowed_group_sids": "[\"S-1-1-0\"]",
"access_level": "public",
"doc_type": "catalog"
}
}
The allowed_group_sids field is a JSON array string of Windows SIDs that are allowed to access the document. S-1-1-0 is the well-known "Everyone" SID.
Bedrock KB Ingestion Jobs automatically read these .metadata.json files alongside documents, chunk the content, vectorize with Amazon Titan Text Embeddings v2 (1024 dimensions), and store everything in the vector store. No custom ETL pipeline needed.
Design Decisions and Trade-offs
At scale (thousands of documents), managing individual .metadata.json files becomes a maintenance burden. The system supports three approaches:
| Approach | Status | Pros | Cons |
|---|---|---|---|
.metadata.json (current default) |
โ Production | Bedrock KB native, no extra infra | Doubles file count, manual management |
| ONTAP REST API auto-generation | โ Partially implemented | File server ACLs as source of truth | Requires Embedding server |
| DynamoDB permission master | ๐ Recommended for scale | DB-driven, easy auditing | Requires pre-Ingestion generation pipeline |
The recommended direction for large-scale environments:
ONTAP REST API (ACL retrieval)
โ DynamoDB document-permissions table
โ Auto-generate .metadata.json before Ingestion Job
โ Ingest via S3 AP into Bedrock KB
Multiple Authentication Modes
The system supports 5 authentication configurations, all driven by cdk.context.json parameters:
| Mode | Authentication | Permission Source | Configuration |
|---|---|---|---|
| A: Email/Password | Cognito native | Manual DynamoDB SID registration | Default (no extra config) |
| B: SAML AD Federation | Cognito + SAML IdP | AD Sync Lambda โ auto SID retrieval | enableAdFederation=true |
| C: OIDC + LDAP | Cognito + OIDC IdP | LDAP query โ auto UID/GID retrieval |
oidcProviderConfig + ldapConfig
|
| D: OIDC Claims Only | Cognito + OIDC IdP | OIDC token claims โ group mapping |
oidcProviderConfig + groupClaimName
|
| E: SAML + OIDC Hybrid | Both IdPs simultaneously | Combined SID + UID/GID | Both configs + permissionMappingStrategy=hybrid
|
The OIDC/LDAP federation enables zero-touch user provisioning: when a user signs in via the OIDC IdP for the first time, the Identity Sync Lambda automatically queries LDAP for their UID/GID/groups and stores them in DynamoDB. No admin intervention required.
For environments with FSx ONTAP UNIX volumes, the system also supports ONTAP name-mapping โ automatically resolving UNIX usernames to Windows users via the ONTAP REST API.
Agentic AI: Beyond Document Search
The system isn't just a search engine. Toggle between three modes with one click:
- KB Mode: Permission-aware document search and Q&A
- Single Agent Mode: Permission-aware autonomous multi-step reasoning via a single Bedrock Agent
- Multi Agent Mode: Supervisor + Collaborator pattern for complex multi-agent workflows
Agent mode includes an Agent Directory โ a catalog-style management screen where you can create, edit, share, and schedule Bedrock Agents from templates. The directory now includes a Registry tab for importing agents from AWS Agent Registry, and a Teams tab for creating multi-agent teams.
Permission filtering works in all modes. Even when agents autonomously search and reason across multiple documents, only documents the user is authorized to see are included.
AgentCore Memory (v3.3.0)
With enableAgentCoreMemory=true, the system integrates Amazon Bedrock AgentCore Memory for conversation context maintenance:
- Short-term memory: In-session conversation history (TTL: 3 days)
- Long-term memory: Cross-session user preferences and summaries (semantic + summary strategies)
Episodic Memory (v4.0.0)
Building on AgentCore Memory, enableEpisodicMemory=true adds a new dimension: the agent remembers how it solved problems, not just what it knows.
While semantic memory stores facts and summaries, episodic memory records complete task episodes โ the goal, reasoning steps, actions taken, outcomes, and reflections. When a similar task comes up later, the agent automatically retrieves the top 3 most relevant past episodes and injects them into its reasoning context.
Think of it as giving the agent a "lessons learned" database that grows with every interaction:
- Episode recording: After each conversation, a Background Reflection process automatically extracts episodes
- Similar episode injection: Before executing a task, the agent searches for similar past episodes and uses them to inform its approach
- Episode management UI: Browse, search (semantic, 300ms debounce), and delete episodes from the sidebar
- Graceful degradation: If episodic memory fails, core agent functionality continues uninterrupted
The UI shows an "๐ Referenced past experience (N)" badge on responses that leveraged episodic memory.
Additional Features
Smart Routing (v3.1.0)
Automatic model selection based on query complexity. Short factual queries route to Claude Haiku (fast, cheap); complex analytical queries route to Claude Sonnet (powerful). Toggle ON/OFF in the sidebar.
Image Analysis RAG (v3.1.0)
Drag-and-drop image upload in the chat input. Images are analyzed with Bedrock Vision API (Claude Haiku 4.5) and the analysis is integrated into KB search context.
6-Layer Security
| Layer | Technology | Purpose |
|---|---|---|
| L1 | CloudFront Geo Restriction | Geographic access control |
| L2 | AWS WAF (6 rules) | Attack pattern detection |
| L3 | CloudFront OAC (SigV4) | Origin authentication |
| L4 | Lambda Function URL IAM Auth | API-level access control |
| L5 | Cognito JWT / SAML / OIDC | User authentication |
| L6 | SID / UID+GID / OIDC Group Filtering | Document-level authorization |
8-Language i18n โ Why It Matters
The UI and all documentation (README, guides, setup instructions) are available in 8 languages: Japanese, English, Korean, Simplified Chinese, Traditional Chinese, French, German, and Spanish.
This isn't just a nice-to-have. Enterprise file servers are inherently multi-regional โ a global company's FSx ONTAP volumes serve teams across Tokyo, Seoul, Shanghai, Frankfurt, and New York. If the RAG interface only speaks English, you've created a barrier for the very users who need it most.
The implementation uses Next.js next-intl with per-locale message files. Every UI string goes through useTranslations(). The AI's chat responses also match the user's language โ a Korean user asking in Korean gets a Korean answer with Korean citation labels.
Here's what the card grid looks like across all 8 languages:
| ๐ฏ๐ต ๆฅๆฌ่ช | ๐บ๐ธ English | ๐ฐ๐ท ํ๊ตญ์ด | ๐จ๐ณ ็ฎไฝไธญๆ |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| ๐น๐ผ ็น้ซไธญๆ | ๐ซ๐ท Franรงais | ๐ฉ๐ช Deutsch | ๐ช๐ธ Espaรฑol |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
v4.0.0: Six New Features (April 2026)
v4.0.0 adds six capabilities that extend the system from document search into a more complete enterprise AI platform. All are opt-in via CDK parameters โ zero additional cost when disabled.
Agent Registry Integration
enableAgentRegistry=true adds a "Registry" tab to the Agent Directory, connecting to AWS Agent Registry (Amazon Bedrock AgentCore). Your organization's shared Agents, Tools, and MCP Servers become searchable and importable directly from the UI.
- Semantic search across registry records
- One-click import from registry to local Bedrock Agent (name collision handling with
_imported_YYYYMMDDsuffix) - Publish local agents to the registry (with approval workflow)
- Resource type filters (Agent / Tool / MCP Server)
- Cross-region access via
agentRegistryRegionparameter - Fault isolation: registry errors don't affect other Agent Directory tabs
Note: Agent Registry is a Preview API as of April 2026. The implementation uses SigV4-signed HTTP with REST path mapping. When the Node.js SDK adds native commands, the client can be swapped with minimal changes.
Multimodal RAG Search
embeddingModel: "nova-multimodal" switches the Knowledge Base from text-only (Titan Text Embeddings v2) to cross-modal search across text, images, video, and audio using Amazon Nova Multimodal Embeddings.
The architecture uses two patterns that make model changes painless:
- Embedding Model Registry: Model definitions are configuration objects in a catalog. Adding a new model = adding one entry
- KB Config Strategy: Dynamically generates KB configuration, IAM policies, and Lambda environment variables from the registry entry
For gradual migration, multimodalKbMode: "dual" runs two KBs in parallel โ text-only (Titan) + multimodal (Nova) โ with a query router that directs text queries to the text KB and image-attached queries to the multimodal KB. Users can toggle between them.
Caveat: Nova Multimodal Embeddings is currently available in us-east-1 and us-west-2 only. Changing the embedding model requires KB recreation and full data re-ingestion.
Guardrails Organizational Safeguards
enableGuardrails=true with optional guardrailsConfig gives fine-grained control over Bedrock Guardrails:
- Content filter strength: Per-category (sexual, violence, hate, insults, misconduct, prompt attack) input/output filter levels (NONE/LOW/MEDIUM/HIGH)
- Topic policies: Block specific topics (e.g., competitor information)
- PII detection: Per-entity-type actions (BLOCK or ANONYMIZE for email, phone, credit card, etc.)
- Contextual grounding: Hallucination prevention with configurable thresholds
The UI adds:
- GuardrailsStatusBadge on every chat response: โ safe / โ ๏ธ filtered / โ ๏ธ check unavailable
- GuardrailsAdminPanel in the sidebar (admin-only, read-only): shows account guardrails config and detects AWS Organizations Organizational Safeguards
-
EMF metrics:
GuardrailsInputBlocked,GuardrailsOutputFiltered,GuardrailsPassthroughโ CloudWatch dashboard + SNS alerts
Error handling follows a Fail-Open strategy: if the Guardrails API times out (5s) or returns 5xx, chat continues normally with an error log. The AI never stops working because of a guardrails hiccup.
Voice Chat (Amazon Nova Sonic)
enableVoiceChat=true adds voice interaction. Click the ๐ค microphone button (or Ctrl+Shift+V), speak your question, and get a text + audio response โ all through the same permission-aware RAG pipeline.
Phase 1 (current) uses REST + Bedrock Converse API:
Browser (mic) โ POST /api/voice/stream โ Converse API (speechโtext)
โ KB/Agent RAG pipeline
โ text + audio response โ Browser
- Waveform animation (Canvas-based, input=blue, output=green, respects
prefers-reduced-motion) - 30-second silence timeout with auto-stop
- Auto-reconnect (max 3 attempts), then text fallback
- Works in KB mode, Single Agent mode, and Multi Agent mode
- Permission filtering is input-method-agnostic โ voice queries get the same SID/UID/GID filtering as text
Phase 2 (planned) will use API Gateway WebSocket + Nova Sonic InvokeModelWithBidirectionalStream for real-time bidirectional streaming.
Estimated monthly cost: $70โ$100 (input ~$0.0019/min, output ~$0.0076/min).
AgentCore Policy
enableAgentPolicy=true adds agent behavior control. Define boundaries in natural language โ what tools the agent can use, what APIs it can call, what data it can access โ and the system enforces them in real-time.
- 3 policy templates: Security-focused, Cost-focused, Flexibility-focused
- PolicyEvaluationMiddleware: Evaluates every agent action against the policy (3s timeout)
-
Fail-open / Fail-closed:
policyFailureModecontrols behavior when policy evaluation fails -
Violation logging: EMF-format metrics (
PolicyViolationCount,PolicyEvaluationLatency) โ CloudWatch dashboard - PolicySection in Agent create/edit forms: optional natural language policy input (max 2000 chars)
- PolicyBadge (๐ก๏ธ) on agents with active policies
Note: AgentCore Policy reached GA in March 2026 with a Policy Engine + Gateway architecture. Policies are written in Cedar language (with natural language auto-conversion). The implementation uses SigV4-signed HTTP.
Feature Flags Runtime API
A cross-cutting change that affects all v4 features: the UI no longer relies on NEXT_PUBLIC_* build-time environment variables. Instead, a /api/config/features endpoint reads Lambda environment variables at runtime and returns feature flags. The useFeatureFlags hook caches flags in localStorage for instant page loads.
This means you can enable/disable features by changing CDK parameters and redeploying โ without rebuilding the Docker image.
Multi-Agent Collaboration: Now Default-On
When enableAgent=true, multi-agent collaboration (enableMultiAgent) is now enabled by default. Bedrock Agents have zero standby cost, so this adds no running cost. Token consumption only increases (3-6x) when users actually chat in Multi Agent mode. Set enableMultiAgent: false explicitly to disable.
Multi-Agent Collaboration: Permission-Aware Agent Teams
The system uses Amazon Bedrock Agents' Supervisor + Collaborator pattern. Instead of a single agent handling everything, specialized agents work together:
- Supervisor Agent: Detects user intent, routes tasks to the right collaborator
- Permission Resolver: Resolves SID/UID/GID from the User Access Table
- Retrieval Agent: Executes KB search with permission metadata filters
- Analysis Agent: Summarizes and reasons over filtered context (no direct KB access)
- Output Agent: Generates reports and documents (no direct KB access)
The key design principle: KB access is restricted to Permission Resolver and Retrieval Agent only. Analysis and Output agents receive "filtered context" โ they never touch the knowledge base directly. This preserves the same SID/UID/GID permission boundaries that exist in single-agent mode.
Cost Structure
| Scenario | Agent Calls | Est. Cost/Request |
|---|---|---|
| Single Agent (existing) | 1 | ~$0.02 |
| Multi-Agent (simple query) | 2โ3 | ~$0.06 |
| Multi-Agent (complex query) | 4โ6 | ~$0.17 |
Deployment Lessons Learned
CloudFormation AgentCollaboration values: Only DISABLED, SUPERVISOR, and SUPERVISOR_ROUTER are valid. COLLABORATOR is NOT a valid value. Collaborator Agents should not set this property at all.
2-stage deploy is mandatory: You cannot create a Supervisor Agent with SUPERVISOR_ROUTER and collaborators in a single CloudFormation operation. The solution: create with DISABLED first, then a Custom Resource Lambda changes to SUPERVISOR_ROUTER, associates collaborators, and runs PrepareAgent.
IAM permissions: The Supervisor Agent's IAM role needs bedrock:GetAgentAlias + bedrock:InvokeAgent on agent-alias/*/*. The Custom Resource Lambda needs iam:PassRole for the Supervisor role.
Tips for Builders
OpenLDAP memberOf Overlay
If you're testing with OpenLDAP, the LDAP Connector reads the memberOf attribute from user entries. Basic OpenLDAP doesn't populate this automatically โ you need to add moduleload memberof and overlay memberof to slapd.conf, and create groupOfNames entries (not just posixGroup).
The repo includes setup-openldap.sh that handles all of this automatically.
Geo Restriction Default
The WAF configuration defaults to Japan-only access (allowedCountries: ["JP"]). If you're deploying outside Japan, update this before deploying:
{ "allowedCountries": ["JP", "US", "DE", "SG"] }
Set to [] for worldwide access.
Existing FSx ONTAP Reuse
If you already have an FSx for ONTAP file system, specify existingFileSystemId, existingSvmId, and existingVolumeId in cdk.context.json to skip FSx creation entirely. This cuts deployment time from 30-40 minutes to under 10 minutes.
Built with Kiro
I used Kiro throughout the entire development lifecycle โ specs for requirements-to-code traceability, hooks for automated validation on file saves, and steering files for project-specific rules that persist across sessions. The v4.0.0 release involved 195 files changed, 8-language documentation updates, property-based tests with fast-check, and live AWS environment verification across multiple accounts โ all developed with Kiro's assistance. As a solo developer, this level of tooling makes enterprise-quality projects feasible.
Getting Started
git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG.git
cd FSx-for-ONTAP-Agentic-Access-Aware-RAG && npm install
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/ap-northeast-1
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/us-east-1
bash demo-data/scripts/pre-deploy-setup.sh
npx cdk deploy --all --require-approval never
bash demo-data/scripts/post-deploy-setup.sh
Prerequisites: Node.js 22+, Docker, AWS CLI configured with AdministratorAccess. Total deployment time is about 30-40 minutes (FSx ONTAP creation takes 20-30 minutes). Use existingFileSystemId to skip FSx creation if you already have one.
What's Next
The project is at v4.0.0 with 19 implementation aspects and actively evolving. Some directions I'm exploring:
-
Voice Chat Phase 2: WebSocket via API Gateway + Nova Sonic
InvokeModelWithBidirectionalStreamfor real-time bidirectional streaming (replacing the current REST-based Phase 1) -
DynamoDB-driven permission master: Eliminating per-file
.metadata.jsonmanagement for large-scale environments - Multi-volume embedding: Independent S3 Access Points per FSx for ONTAP volume with cross-volume search
- Agent Registry GA SDK migration: When the Node.js SDK adds native Agent Registry commands, swap from SigV4 HTTP to SDK calls
I'm looking for feedback on:
- Permission models: Are SID/UID-GID/OIDC-group/hybrid strategies sufficient for your use cases?
- Voice interaction patterns: What voice-specific workflows would be valuable in enterprise RAG?
- Policy templates: What agent behavior boundaries matter most in your organization?
- Guardrails configurations: What content filtering rules does your compliance team require?
If you try it out, I'd love to hear about your experience โ especially edge cases I haven't considered. PRs and issues are welcome.
๐ GitHub Repository โ README available in 8 languages, same as the application UI
Yoshiki Fujiwara


















Top comments (1)
awesome ๐๐ป
thanks for sharing.