Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on Apr 5 • Edited on May 14

Building an Agentic Access-Aware RAG System with Amazon FSx for NetApp ONTAP, S3 Vectors, and S3 Access Points— Where AI Respects File Permissions

#aws #amazonfsxfornetappontap #agenticai #rag

🆕 Updated April 2026: v4.0.0 released with 6 new features — Agent Registry, Multimodal RAG, Guardrails, Episodic Memory, Voice Chat, and AgentCore Policy. See what's new.

Introduction

Enterprise data lives on file servers. And on those file servers, not everyone can see everything — NTFS ACLs, UNIX permissions, and group policies control who accesses what. But when you plug that data into a Retrieval-Augmented Generation (RAG) system, those permission boundaries tend to disappear. Suddenly, anyone can ask the AI about another team's, division's, or board member's confidential information.

But there's a flip side to this problem that's equally important: without permission awareness, the AI can't fully help the people it should be helping.

Think about it. An engineer has years of design docs, project specs, and team-internal notes in their department's shared folder. A sales lead has pipeline data, customer contracts, and regional forecasts in theirs. When you strip away permissions and dump everything into one vector store, the AI doesn't just leak confidential data — it also drowns each user's results in irrelevant noise from every other team. The engineer gets sales forecasts mixed into their search results. The sales lead gets CI/CD pipeline docs they'll never need.

Permission-aware RAG flips this around. Because the system knows exactly which files each user can access, it delivers personalized, noise-free AI assistance grounded in the data each person actually works with day to day. Your personal folder, your team's shared drive, the cross-functional project space you're part of — the AI sees what you see, nothing more, nothing less.

I built Agentic Access-Aware RAG to make this real. It's an open-source system that lets AI agents autonomously search, analyze, and respond to enterprise data stored on Amazon FSx for NetApp ONTAP — while respecting per-user file-level access permissions. The same question yields different answers depending on who's asking: an admin gets the full financial report, a project member gets their project's restricted docs, and a general user gets public information only. Each user gets an AI assistant that's effectively customized to their role and responsibilities — without any manual configuration.

The entire stack deploys with a single npx cdk deploy --all command.

👉 GitHub: Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG
📦 Latest Release: v4.0.0 — 6 new features added

Architecture at a Glance

Browser → AWS WAF → CloudFront (OAC+Geo) → Lambda Web Adapter (Next.js 15)
                                                    │
              ┌─────────────┬───────────────────────┼──────────────────┐
              ▼             ▼                       ▼                  ▼
        Cognito       Bedrock KB              DynamoDB            DynamoDB
       User Pool    + S3 Vectors /          user-access          perm-cache
                    OpenSearch SL           (SID Data)         (Perm Cache)
                         │
                         ▼
                  FSx for ONTAP
                  (SVM + Volume)
                + S3 Access Point

The system is organized into 7 CDK stacks: WAF, Networking, Security (Cognito), Storage (FSx ONTAP + DynamoDB), AI (Bedrock KB + vector store), WebApp (Lambda + CloudFront), and an optional Embedding stack.

The Core Idea: Permission-Aware RAG

Traditional RAG retrieves documents based on semantic similarity alone. This system adds a second dimension: SID-based permission filtering.

Here's the flow:

User sends a question via the chat UI
The app retrieves the user's SID list (personal SID + group SIDs) from DynamoDB
Bedrock KB Retrieve API performs vector search — each result carries allowed_group_sids metadata
The app matches each document's SIDs against the user's SIDs
Only permitted documents are passed to the Converse API for answer generation
The user sees a filtered response with citation badges showing access levels

■ Admin user: SIDs = [...-512 (Domain Admins), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → ...-512 match  → ✅ Permitted
  engineering/     → No match       → ❌ Filtered out (no noise from other teams)

■ Engineer (Engineering group member): SIDs = [...-1100 (Engineering), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → No match       → ❌ Denied
  engineering/     → ...-1100 match → ✅ Their team's docs, front and center

■ Sales user: SIDs = [...-1200 (Sales), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → No match       → ❌ Denied
  engineering/     → No match       → ❌ No engineering noise in their results

The engineer asking "What's the status of Project X?" gets answers from their team's internal docs — not from sales forecasts or HR policies. The sales lead asking "What are our Q3 targets?" gets their regional data without wading through engineering specs. Each user's AI experience is naturally scoped to the data they work with every day.

S3 Access Points: The Bridge Between FSx ONTAP and Bedrock KB

One of the most impactful recent additions is S3 Access Point integration with FSx for ONTAP. This creates a clean, single-path data ingestion architecture:

FSx ONTAP Volume (/data)
  ├── public/company-overview.md
  ├── public/company-overview.md.metadata.json
  ├── confidential/financial-report.md
  ├── confidential/financial-report.md.metadata.json
      │
      │  S3 Access Point
      ▼
  Bedrock KB Data Source (S3 AP alias)
      │  Ingestion Job (chunking + Titan Embed v2)
      ▼
  Vector Store (S3 Vectors or OpenSearch Serverless)

Before S3 Access Points, getting data from FSx ONTAP into Bedrock KB required either a custom Embedding server with CIFS mounts or manual S3 uploads. Now, Bedrock KB reads documents directly from the FSx ONTAP volume through the S3 Access Point — no intermediate copies, no sync scripts.

The S3 AP user type is automatically selected based on your AD configuration:

AD Configuration	Volume Style	S3 AP User Type	Behavior
AD configured	NTFS	WINDOWS (`Admin`)	NTFS ACLs automatically applied
No AD	NTFS/UNIX	UNIX (`root`)	All files accessible; permission control via `.metadata.json`

One gotcha I discovered: the S3 AP WindowsUser must not include the domain prefix. DEMO\Admin works for CLI operations but causes AccessDenied on data plane APIs (ListObjects, GetObject). Always specify just Admin.

S3 Vectors: Low-Cost Vector Storage

The default vector store is Amazon S3 Vectors — a relatively new service that brings vector search costs down to a few dollars per month, compared to ~$700/month for OpenSearch Serverless.

Configuration	Cost	Latency	Best For
S3 Vectors (default)	~$2-5/month	Sub-second to 100ms	Demo, dev, cost optimization
OpenSearch Serverless	~$700/month	~10ms	High-performance production

S3 Vectors does have a 2KB filterable metadata limit per vector. Since Bedrock KB's internal metadata already consumes ~1KB, custom metadata is effectively limited to ~1KB. The system handles this by setting all metadata keys (including allowed_group_sids) as non-filterable and performing SID matching on the application side after retrieval.

If you start with S3 Vectors and later need higher performance, you can export on-demand to OpenSearch Serverless using the included export-to-opensearch.sh script.

Embedding Design: `.metadata.json` and the Ingestion Pipeline

Permission metadata follows the standard Bedrock KB metadata file specification. Each document has a companion .metadata.json file:

product-catalog.md                    ← Document body
product-catalog.md.metadata.json      ← Permission metadata

The metadata format:

{
  "metadataAttributes": {
    "allowed_group_sids": "[\"S-1-1-0\"]",
    "access_level": "public",
    "doc_type": "catalog"
  }
}

The allowed_group_sids field is a JSON array string of Windows SIDs that are allowed to access the document. S-1-1-0 is the well-known "Everyone" SID.

Bedrock KB Ingestion Jobs automatically read these .metadata.json files alongside documents, chunk the content, vectorize with Amazon Titan Text Embeddings v2 (1024 dimensions), and store everything in the vector store. No custom ETL pipeline needed.

Design Decisions and Trade-offs

At scale (thousands of documents), managing individual .metadata.json files becomes a maintenance burden. The system supports three approaches:

Approach	Status	Pros	Cons
`.metadata.json` (current default)	✅ Production	Bedrock KB native, no extra infra	Doubles file count, manual management
ONTAP REST API auto-generation	✅ Partially implemented	File server ACLs as source of truth	Requires Embedding server
DynamoDB permission master	🔜 Recommended for scale	DB-driven, easy auditing	Requires pre-Ingestion generation pipeline

The recommended direction for large-scale environments:

ONTAP REST API (ACL retrieval)
  → DynamoDB document-permissions table
  → Auto-generate .metadata.json before Ingestion Job
  → Ingest via S3 AP into Bedrock KB

Multiple Authentication Modes

The system supports 5 authentication configurations, all driven by cdk.context.json parameters:

Mode	Authentication	Permission Source	Configuration
A: Email/Password	Cognito native	Manual DynamoDB SID registration	Default (no extra config)
B: SAML AD Federation	Cognito + SAML IdP	AD Sync Lambda → auto SID retrieval	`enableAdFederation=true`
C: OIDC + LDAP	Cognito + OIDC IdP	LDAP query → auto UID/GID retrieval	`oidcProviderConfig` + `ldapConfig`
D: OIDC Claims Only	Cognito + OIDC IdP	OIDC token claims → group mapping	`oidcProviderConfig` + `groupClaimName`
E: SAML + OIDC Hybrid	Both IdPs simultaneously	Combined SID + UID/GID	Both configs + `permissionMappingStrategy=hybrid`

The OIDC/LDAP federation enables zero-touch user provisioning: when a user signs in via the OIDC IdP for the first time, the Identity Sync Lambda automatically queries LDAP for their UID/GID/groups and stores them in DynamoDB. No admin intervention required.

For environments with FSx ONTAP UNIX volumes, the system also supports ONTAP name-mapping — automatically resolving UNIX usernames to Windows users via the ONTAP REST API.

Agentic AI: Beyond Document Search

The system isn't just a search engine. Toggle between three modes with one click:

KB Mode: Permission-aware document search and Q&A
Single Agent Mode: Permission-aware autonomous multi-step reasoning via a single Bedrock Agent
Multi Agent Mode: Supervisor + Collaborator pattern for complex multi-agent workflows

Agent mode includes an Agent Directory — a catalog-style management screen where you can create, edit, share, and schedule Bedrock Agents from templates. The directory now includes a Registry tab for importing agents from AWS Agent Registry, and a Teams tab for creating multi-agent teams.

Permission filtering works in all modes. Even when agents autonomously search and reason across multiple documents, only documents the user is authorized to see are included.

AgentCore Memory (v3.3.0)

With enableAgentCoreMemory=true, the system integrates Amazon Bedrock AgentCore Memory for conversation context maintenance:

Short-term memory: In-session conversation history (TTL: 3 days)
Long-term memory: Cross-session user preferences and summaries (semantic + summary strategies)

Episodic Memory (v4.0.0)

Building on AgentCore Memory, enableEpisodicMemory=true adds a new dimension: the agent remembers how it solved problems, not just what it knows.

While semantic memory stores facts and summaries, episodic memory records complete task episodes — the goal, reasoning steps, actions taken, outcomes, and reflections. When a similar task comes up later, the agent automatically retrieves the top 3 most relevant past episodes and injects them into its reasoning context.

Think of it as giving the agent a "lessons learned" database that grows with every interaction:

Episode recording: After each conversation, a Background Reflection process automatically extracts episodes
Similar episode injection: Before executing a task, the agent searches for similar past episodes and uses them to inform its approach
Episode management UI: Browse, search (semantic, 300ms debounce), and delete episodes from the sidebar
Graceful degradation: If episodic memory fails, core agent functionality continues uninterrupted

The UI shows an "📚 Referenced past experience (N)" badge on responses that leveraged episodic memory.

Additional Features

Smart Routing (v3.1.0)

Automatic model selection based on query complexity. Short factual queries route to Claude Haiku (fast, cheap); complex analytical queries route to Claude Sonnet (powerful). Toggle ON/OFF in the sidebar.

Image Analysis RAG (v3.1.0)

Drag-and-drop image upload in the chat input. Images are analyzed with Bedrock Vision API (Claude Haiku 4.5) and the analysis is integrated into KB search context.

6-Layer Security

Layer	Technology	Purpose
L1	CloudFront Geo Restriction	Geographic access control
L2	AWS WAF (6 rules)	Attack pattern detection
L3	CloudFront OAC (SigV4)	Origin authentication
L4	Lambda Function URL IAM Auth	API-level access control
L5	Cognito JWT / SAML / OIDC	User authentication
L6	SID / UID+GID / OIDC Group Filtering	Document-level authorization

8-Language i18n — Why It Matters

The UI and all documentation (README, guides, setup instructions) are available in 8 languages: Japanese, English, Korean, Simplified Chinese, Traditional Chinese, French, German, and Spanish.

This isn't just a nice-to-have. Enterprise file servers are inherently multi-regional — a global company's FSx ONTAP volumes serve teams across Tokyo, Seoul, Shanghai, Frankfurt, and New York. If the RAG interface only speaks English, you've created a barrier for the very users who need it most.

The implementation uses Next.js next-intl with per-locale message files. Every UI string goes through useTranslations(). The AI's chat responses also match the user's language — a Korean user asking in Korean gets a Korean answer with Korean citation labels.

Here's what the card grid looks like across all 8 languages:

🇯🇵 日本語	🇺🇸 English	🇰🇷 한국어	🇨🇳 简体中文

🇹🇼 繁體中文	🇫🇷 Français	🇩🇪 Deutsch	🇪🇸 Español

v4.0.0: Six New Features (April 2026)

v4.0.0 adds six capabilities that extend the system from document search into a more complete enterprise AI platform. All are opt-in via CDK parameters — zero additional cost when disabled.

Agent Registry Integration

enableAgentRegistry=true adds a "Registry" tab to the Agent Directory, connecting to AWS Agent Registry (Amazon Bedrock AgentCore). Your organization's shared Agents, Tools, and MCP Servers become searchable and importable directly from the UI.

Semantic search across registry records
One-click import from registry to local Bedrock Agent (name collision handling with _imported_YYYYMMDD suffix)
Publish local agents to the registry (with approval workflow)
Resource type filters (Agent / Tool / MCP Server)
Cross-region access via agentRegistryRegion parameter
Fault isolation: registry errors don't affect other Agent Directory tabs

Note: Agent Registry is a Preview API as of April 2026. The implementation uses SigV4-signed HTTP with REST path mapping. When the Node.js SDK adds native commands, the client can be swapped with minimal changes.

Multimodal RAG Search

embeddingModel: "nova-multimodal" switches the Knowledge Base from text-only (Titan Text Embeddings v2) to cross-modal search across text, images, video, and audio using Amazon Nova Multimodal Embeddings.

The architecture uses two patterns that make model changes painless:

Embedding Model Registry: Model definitions are configuration objects in a catalog. Adding a new model = adding one entry
KB Config Strategy: Dynamically generates KB configuration, IAM policies, and Lambda environment variables from the registry entry

For gradual migration, multimodalKbMode: "dual" runs two KBs in parallel — text-only (Titan) + multimodal (Nova) — with a query router that directs text queries to the text KB and image-attached queries to the multimodal KB. Users can toggle between them.

Caveat: Nova Multimodal Embeddings is currently available in us-east-1 and us-west-2 only. Changing the embedding model requires KB recreation and full data re-ingestion.

Guardrails Organizational Safeguards

enableGuardrails=true with optional guardrailsConfig gives fine-grained control over Bedrock Guardrails:

Content filter strength: Per-category (sexual, violence, hate, insults, misconduct, prompt attack) input/output filter levels (NONE/LOW/MEDIUM/HIGH)
Topic policies: Block specific topics (e.g., competitor information)
PII detection: Per-entity-type actions (BLOCK or ANONYMIZE for email, phone, credit card, etc.)
Contextual grounding: Hallucination prevention with configurable thresholds

The UI adds:

GuardrailsStatusBadge on every chat response: ✅ safe / ⚠️ filtered / ⚠️ check unavailable
GuardrailsAdminPanel in the sidebar (admin-only, read-only): shows account guardrails config and detects AWS Organizations Organizational Safeguards
EMF metrics: GuardrailsInputBlocked, GuardrailsOutputFiltered, GuardrailsPassthrough → CloudWatch dashboard + SNS alerts

Error handling follows a Fail-Open strategy: if the Guardrails API times out (5s) or returns 5xx, chat continues normally with an error log. The AI never stops working because of a guardrails hiccup.

Voice Chat (Amazon Nova Sonic)

enableVoiceChat=true adds voice interaction. Click the 🎤 microphone button (or Ctrl+Shift+V), speak your question, and get a text + audio response — all through the same permission-aware RAG pipeline.

Phase 1 (current) uses REST + Bedrock Converse API:

Browser (mic) → POST /api/voice/stream → Converse API (speech→text)
                                        → KB/Agent RAG pipeline
                                        → text + audio response → Browser

Waveform animation (Canvas-based, input=blue, output=green, respects prefers-reduced-motion)
30-second silence timeout with auto-stop
Auto-reconnect (max 3 attempts), then text fallback
Works in KB mode, Single Agent mode, and Multi Agent mode
Permission filtering is input-method-agnostic — voice queries get the same SID/UID/GID filtering as text

Phase 2 (planned) will use API Gateway WebSocket + Nova Sonic InvokeModelWithBidirectionalStream for real-time bidirectional streaming.

Estimated monthly cost: $70–$100 (input ~$0.0019/min, output ~$0.0076/min).

AgentCore Policy

enableAgentPolicy=true adds agent behavior control. Define boundaries in natural language — what tools the agent can use, what APIs it can call, what data it can access — and the system enforces them in real-time.

3 policy templates: Security-focused, Cost-focused, Flexibility-focused
PolicyEvaluationMiddleware: Evaluates every agent action against the policy (3s timeout)
Fail-open / Fail-closed: policyFailureMode controls behavior when policy evaluation fails
Violation logging: EMF-format metrics (PolicyViolationCount, PolicyEvaluationLatency) → CloudWatch dashboard
PolicySection in Agent create/edit forms: optional natural language policy input (max 2000 chars)
PolicyBadge (🛡️) on agents with active policies

Note: AgentCore Policy reached GA in March 2026 with a Policy Engine + Gateway architecture. Policies are written in Cedar language (with natural language auto-conversion). The implementation uses SigV4-signed HTTP.

Feature Flags Runtime API

A cross-cutting change that affects all v4 features: the UI no longer relies on NEXT_PUBLIC_* build-time environment variables. Instead, a /api/config/features endpoint reads Lambda environment variables at runtime and returns feature flags. The useFeatureFlags hook caches flags in localStorage for instant page loads.

This means you can enable/disable features by changing CDK parameters and redeploying — without rebuilding the Docker image.

Multi-Agent Collaboration: Now Default-On

When enableAgent=true, multi-agent collaboration (enableMultiAgent) is now enabled by default. Bedrock Agents have zero standby cost, so this adds no running cost. Token consumption only increases (3-6x) when users actually chat in Multi Agent mode. Set enableMultiAgent: false explicitly to disable.

Multi-Agent Collaboration: Permission-Aware Agent Teams

The system uses Amazon Bedrock Agents' Supervisor + Collaborator pattern. Instead of a single agent handling everything, specialized agents work together:

Supervisor Agent: Detects user intent, routes tasks to the right collaborator
Permission Resolver: Resolves SID/UID/GID from the User Access Table
Retrieval Agent: Executes KB search with permission metadata filters
Analysis Agent: Summarizes and reasons over filtered context (no direct KB access)
Output Agent: Generates reports and documents (no direct KB access)

The key design principle: KB access is restricted to Permission Resolver and Retrieval Agent only. Analysis and Output agents receive "filtered context" — they never touch the knowledge base directly. This preserves the same SID/UID/GID permission boundaries that exist in single-agent mode.

Cost Structure

Scenario	Agent Calls	Est. Cost/Request
Single Agent (existing)	1	~$0.02
Multi-Agent (simple query)	2–3	~$0.06
Multi-Agent (complex query)	4–6	~$0.17

Deployment Lessons Learned

CloudFormation AgentCollaboration values: Only DISABLED, SUPERVISOR, and SUPERVISOR_ROUTER are valid. COLLABORATOR is NOT a valid value. Collaborator Agents should not set this property at all.

2-stage deploy is mandatory: You cannot create a Supervisor Agent with SUPERVISOR_ROUTER and collaborators in a single CloudFormation operation. The solution: create with DISABLED first, then a Custom Resource Lambda changes to SUPERVISOR_ROUTER, associates collaborators, and runs PrepareAgent.

IAM permissions: The Supervisor Agent's IAM role needs bedrock:GetAgentAlias + bedrock:InvokeAgent on agent-alias/*/*. The Custom Resource Lambda needs iam:PassRole for the Supervisor role.

Tips for Builders

OpenLDAP `memberOf` Overlay

If you're testing with OpenLDAP, the LDAP Connector reads the memberOf attribute from user entries. Basic OpenLDAP doesn't populate this automatically — you need to add moduleload memberof and overlay memberof to slapd.conf, and create groupOfNames entries (not just posixGroup).

The repo includes setup-openldap.sh that handles all of this automatically.

Geo Restriction Default

The WAF configuration defaults to Japan-only access (allowedCountries: ["JP"]). If you're deploying outside Japan, update this before deploying:

{ "allowedCountries": ["JP", "US", "DE", "SG"] }

Set to [] for worldwide access.

Existing FSx ONTAP Reuse

If you already have an FSx for ONTAP file system, specify existingFileSystemId, existingSvmId, and existingVolumeId in cdk.context.json to skip FSx creation entirely. This cuts deployment time from 30-40 minutes to under 10 minutes.

Built with Kiro

I used Kiro throughout the entire development lifecycle — specs for requirements-to-code traceability, hooks for automated validation on file saves, and steering files for project-specific rules that persist across sessions. The v4.0.0 release involved 195 files changed, 8-language documentation updates, property-based tests with fast-check, and live AWS environment verification across multiple accounts — all developed with Kiro's assistance. As a solo developer, this level of tooling makes enterprise-quality projects feasible.

Getting Started

git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG.git
cd FSx-for-ONTAP-Agentic-Access-Aware-RAG && npm install

npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/ap-northeast-1
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/us-east-1

bash demo-data/scripts/pre-deploy-setup.sh
npx cdk deploy --all --require-approval never
bash demo-data/scripts/post-deploy-setup.sh

Prerequisites: Node.js 22+, Docker, AWS CLI configured with AdministratorAccess. Total deployment time is about 30-40 minutes (FSx ONTAP creation takes 20-30 minutes). Use existingFileSystemId to skip FSx creation if you already have one.

What's Next

The project is at v4.0.0 with 19 implementation aspects and actively evolving. Some directions I'm exploring:

Voice Chat Phase 2: WebSocket via API Gateway + Nova Sonic InvokeModelWithBidirectionalStream for real-time bidirectional streaming (replacing the current REST-based Phase 1)
DynamoDB-driven permission master: Eliminating per-file .metadata.json management for large-scale environments
Multi-volume embedding: Independent S3 Access Points per FSx for ONTAP volume with cross-volume search
Agent Registry GA SDK migration: When the Node.js SDK adds native Agent Registry commands, swap from SigV4 HTTP to SDK calls

I'm looking for feedback on:

Permission models: Are SID/UID-GID/OIDC-group/hybrid strategies sufficient for your use cases?
Voice interaction patterns: What voice-specific workflows would be valuable in enterprise RAG?
Policy templates: What agent behavior boundaries matter most in your organization?
Guardrails configurations: What content filtering rules does your compliance team require?

If you try it out, I'd love to hear about your experience — especially edge cases I haven't considered. PRs and issues are welcome.

👉 GitHub Repository — README available in 8 languages, same as the application UI

Yoshiki Fujiwara

Top comments (1)

Varun Seth AWS Community Builders • Apr 7

awesome 👍🏻

thanks for sharing.