DEV Community

Cover image for Building an Agentic Access-Aware RAG System with Amazon FSx for NetApp ONTAP, S3 Vectors, and S3 Access Points— Where AI Respects File Permissions

Building an Agentic Access-Aware RAG System with Amazon FSx for NetApp ONTAP, S3 Vectors, and S3 Access Points— Where AI Respects File Permissions

Introduction

Enterprise data lives on file servers. And on those file servers, not everyone can see everything — NTFS ACLs, UNIX permissions, and group policies control who accesses what. But when you plug that data into a Retrieval-Augmented Generation (RAG) system, those permission boundaries tend to disappear. Suddenly, anyone can ask the AI about another team's, division's, or board member's confidential information.

But there's a flip side to this problem that's equally important: without permission awareness, the AI can't fully help the people it should be helping.

Think about it. An engineer has years of design docs, project specs, and team-internal notes in their department's shared folder. A sales lead has pipeline data, customer contracts, and regional forecasts in theirs. When you strip away permissions and dump everything into one vector store, the AI doesn't just leak confidential data — it also drowns each user's results in irrelevant noise from every other team. The engineer gets sales forecasts mixed into their search results. The sales lead gets CI/CD pipeline docs they'll never need.

Permission-aware/Access-aware RAG flips this around. Because the system knows exactly which files each user can access, it delivers personalized, noise-free AI assistance grounded in the data each person actually works with day to day. Your personal folder, your team's shared drive, the cross-functional project space you're part of — the AI sees what you see, nothing more, nothing less.

I built Agentic Access-Aware RAG to make this real. It's an open-source system that lets AI agents autonomously search, analyze, and respond to enterprise data stored on Amazon FSx for NetApp ONTAP — while respecting per-user file-level access permissions. The same question yields different answers depending on who's asking: an admin gets the full financial report, a project member gets their project's restricted docs, and a general user gets public information only. Each user gets an AI assistant that's effectively customized to their role and responsibilities — without any manual configuration.

The entire stack deploys with a single npx cdk deploy --all command.

👉 GitHub: Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG


Architecture at a Glance

Browser → AWS WAF → CloudFront (OAC+Geo) → Lambda Web Adapter (Next.js 15)
                                                    │
              ┌─────────────┬───────────────────────┼──────────────────┐
              ▼             ▼                       ▼                  ▼
        Cognito       Bedrock KB              DynamoDB            DynamoDB
       User Pool    + S3 Vectors /          user-access          perm-cache
                    OpenSearch SL           (SID Data)         (Perm Cache)
                         │
                         ▼
                  FSx for ONTAP
                  (SVM + Volume)
                + S3 Access Point
Enter fullscreen mode Exit fullscreen mode

The system is organized into 7 CDK stacks: WAF, Networking, Security (Cognito), Storage (FSx for ONTAP + DynamoDB), AI (Bedrock KB + vector store), WebApp (Lambda + CloudFront), and an optional Embedding stack.

Architecture — KB Mode Card Grid


The Core Idea: Permission-aware/Access-aware RAG

Traditional RAG retrieves documents based on semantic similarity alone. This system adds a second dimension: SID-based permission filtering.

Here's the flow:

  1. User sends a question via the chat UI
  2. The app retrieves the user's SID list (personal SID + group SIDs) from DynamoDB
  3. Bedrock KB Retrieve API performs vector search — each result carries allowed_group_sids metadata
  4. The app matches each document's SIDs against the user's SIDs
  5. Only permitted documents are passed to the Converse API for answer generation
  6. The user sees a filtered response with citation badges showing access levels
■ Admin user: SIDs = [...-512 (Domain Admins), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → ...-512 match  → ✅ Permitted
  engineering/     → No match       → ❌ Filtered out (no noise from other teams)

■ Engineer (Engineering group member): SIDs = [...-1100 (Engineering), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → No match       → ❌ Denied
  engineering/     → ...-1100 match → ✅ Their team's docs, front and center

■ Sales user: SIDs = [...-1200 (Sales), S-1-1-0 (Everyone)]
  public/          → S-1-1-0 match  → ✅ Permitted
  confidential/    → No match       → ❌ Denied
  engineering/     → No match       → ❌ No engineering noise in their results
Enter fullscreen mode Exit fullscreen mode

The engineer asking "What's the status of Project X?" gets answers from their team's internal docs — not from sales forecasts or HR policies. The sales lead asking "What are our Q3 targets?" gets their regional data without wading through engineering specs. Each user's AI experience is naturally scoped to the data they work with every day.

Chat Response with Citation + Access Level Badges


S3 Access Points: The Bridge Between FSx for ONTAP and Bedrock KB

One of the most impactful recent additions is S3 Access Point integration with FSx for ONTAP. This creates a clean, single-path data ingestion architecture:

FSx for ONTAP Volume (/data)
  ├── public/company-overview.md
  ├── public/company-overview.md.metadata.json
  ├── confidential/financial-report.md
  ├── confidential/financial-report.md.metadata.json
      │
      │  S3 Access Point
      ▼
  Bedrock KB Data Source (S3 AP alias)
      │  Ingestion Job (chunking + Titan Embed v2)
      ▼
  Vector Store (S3 Vectors or OpenSearch Serverless)
Enter fullscreen mode Exit fullscreen mode

Before S3 Access Points, getting data from FSx for ONTAP into Bedrock KB required either a custom Embedding server with CIFS mounts or manual S3 uploads. Now, Bedrock KB reads documents directly from the FSx for ONTAP volume through the S3 Access Point — no intermediate copies, no sync scripts.

The S3 AP user type is automatically selected based on your AD configuration:

AD Configuration Volume Style S3 AP User Type Behavior
AD configured NTFS WINDOWS (Admin) NTFS ACLs automatically applied
No AD NTFS/UNIX UNIX (root) All files accessible; permission control via .metadata.json

One gotcha I discovered: the S3 AP WindowsUser must not include the domain prefix. DEMO\Admin works for CLI operations but causes AccessDenied on data plane APIs (ListObjects, GetObject). Always specify just Admin.


S3 Vectors: Low-Cost Vector Storage

The default vector store is Amazon S3 Vectors — a relatively new service that brings vector search costs down to a few dollars per month, compared to ~$700/month for OpenSearch Serverless.

Configuration Cost Latency Best For
S3 Vectors (default) ~$2-5/month Sub-second to 100ms Demo, dev, cost optimization
OpenSearch Serverless ~$700/month ~10ms High-performance production

S3 Vectors does have a 2KB filterable metadata limit per vector. Since Bedrock KB's internal metadata already consumes ~1KB, custom metadata is effectively limited to ~1KB. The system handles this by setting all metadata keys (including allowed_group_sids) as non-filterable and performing SID matching on the application side after retrieval.

If you start with S3 Vectors and later need higher performance, you can export on-demand to OpenSearch Serverless using the included export-to-opensearch.sh script.


Embedding Design: .metadata.json and the Ingestion Pipeline

Permission metadata follows the standard Bedrock KB metadata file specification. Each document has a companion .metadata.json file:

product-catalog.md                    ← Document body
product-catalog.md.metadata.json      ← Permission metadata
Enter fullscreen mode Exit fullscreen mode

The metadata format:

{
  "metadataAttributes": {
    "allowed_group_sids": "[\"S-1-1-0\"]",
    "access_level": "public",
    "doc_type": "catalog"
  }
}
Enter fullscreen mode Exit fullscreen mode

The allowed_group_sids field is a JSON array string of Windows SIDs that are allowed to access the document. S-1-1-0 is the well-known "Everyone" SID.

Bedrock KB Ingestion Jobs automatically read these .metadata.json files alongside documents, chunk the content, vectorize with Amazon Titan Text Embeddings v2 (1024 dimensions), and store everything in the vector store. No custom ETL pipeline needed.

Design Decisions and Trade-offs

At scale (thousands of documents), managing individual .metadata.json files becomes a maintenance burden. The system supports three approaches:

Approach Status Pros Cons
.metadata.json (current default) ✅ Production Bedrock KB native, no extra infra Doubles file count, manual management
ONTAP REST API auto-generation ✅ Partially implemented File server ACLs as source of truth Requires Embedding server
DynamoDB permission master 🔜 Recommended for scale DB-driven, easy auditing Requires pre-Ingestion generation pipeline

The recommended direction for large-scale environments:

ONTAP REST API (ACL retrieval)
  → DynamoDB document-permissions table
  → Auto-generate .metadata.json before Ingestion Job
  → Ingest via S3 AP into Bedrock KB
Enter fullscreen mode Exit fullscreen mode

Multiple Authentication Modes

The system supports 5 authentication configurations, all driven by cdk.context.json parameters:

Mode Authentication Permission Source Configuration
A: Email/Password Cognito native Manual DynamoDB SID registration Default (no extra config)
B: SAML AD Federation Cognito + SAML IdP AD Sync Lambda → auto SID retrieval enableAdFederation=true
C: OIDC + LDAP Cognito + OIDC IdP LDAP query → auto UID/GID retrieval oidcProviderConfig + ldapConfig
D: OIDC Claims Only Cognito + OIDC IdP OIDC token claims → group mapping oidcProviderConfig + groupClaimName
E: SAML + OIDC Hybrid Both IdPs simultaneously Combined SID + UID/GID Both configs + permissionMappingStrategy=hybrid

Sign-in Page — SAML + OIDC Hybrid

The OIDC/LDAP federation (added in v3.4.0) enables zero-touch user provisioning: when a user signs in via the OIDC IdP for the first time, the Identity Sync Lambda automatically queries LDAP for their UID/GID/groups and stores them in DynamoDB. No admin intervention required.

For environments with FSx for ONTAP UNIX volumes, the system also supports ONTAP name-mapping — automatically resolving UNIX usernames to Windows users via the ONTAP REST API.


Agentic AI: Beyond Document Search

The system isn't just a search engine. Toggle between two modes with one click:

  • KB Mode: Permission-aware/Access-aware document search and Q&A
  • Agent Mode: Permission-aware/Access-aware autonomous multi-step reasoning and task execution via Bedrock Agents

Agent Directory

Agent mode includes an Agent Directory — a catalog-style management screen where you can create, edit, share, and schedule Bedrock Agents from templates. 14 workflow cards cover research tasks (market analysis, competitive research, etc.) and output tasks (presentations, approval documents, meeting minutes).

Agent Directory

Permission filtering works in both modes. Even when an Agent autonomously searches and reasons across multiple documents, only documents the user is authorized to see are included.

AgentCore Memory (v3.3.0)

With enableAgentCoreMemory=true, the system integrates Amazon Bedrock AgentCore Memory for conversation context maintenance:

  • Short-term memory: In-session conversation history (TTL: 3 days)
  • Long-term memory: Cross-session user preferences and summaries (semantic + summary strategies)

AgentCore Memory Sidebar


Additional Features

Smart Routing (v3.1.0)

Automatic model selection based on query complexity. Short factual queries route to Claude Haiku (fast, cheap); complex analytical queries route to Claude Sonnet (powerful). Toggle ON/OFF in the sidebar.

Smart Routing

Image Analysis RAG (v3.1.0)

Drag-and-drop image upload in the chat input. Images are analyzed with Bedrock Vision API (Claude Haiku 4.5) and the analysis is integrated into KB search context.

Image Upload

6-Layer Security

Layer Technology Purpose
L1 CloudFront Geo Restriction Geographic access control
L2 AWS WAF (6 rules) Attack pattern detection
L3 CloudFront OAC (SigV4) Origin authentication
L4 Lambda Function URL IAM Auth API-level access control
L5 Cognito JWT / SAML / OIDC User authentication
L6 SID / UID+GID Filtering Document-level authorization

8-Language i18n — Why It Matters

The UI and all documentation (README, guides, setup instructions) are available in 8 languages: Japanese, English, Korean, Simplified Chinese, Traditional Chinese, French, German, and Spanish.

This isn't just a nice-to-have. Enterprise file servers are inherently multi-regional — a global company's FSx for ONTAP volumes serve teams across Tokyo, Seoul, Shanghai, Frankfurt, and New York. If the RAG interface only speaks English, you've created a barrier for the very users who need it most. Non-English-speaking knowledge workers shouldn't need to context-switch languages just to search their own documents.

From a Community Builder perspective, localization also lowers the barrier to adoption. When a solutions architect in São Paulo or a storage admin in Taipei can read the deployment guide in their own language, they're far more likely to actually try it, fork it, and adapt it to their environment. Open-source projects that only document in one language inadvertently limit their community to one language group.

The implementation uses Next.js next-intl with per-locale message files (src/messages/{locale}.json). Every UI string — from card labels to sign-in buttons to error messages — goes through useTranslations(). The sign-in page even detects the browser's preferred language and auto-redirects to the matching locale.

Localization doesn't stop at the UI chrome. The AI's chat responses also match the user's language. The system prompt instructs the model to "respond in the same language as the question" — so a Korean user asking in Korean gets a Korean answer with Korean citation labels ("참조 문서", "전체 접근 가능"), and a German user gets "Referenzierte Dokumente" and "Allgemein zugänglich". This end-to-end language consistency — from sign-in screen to card labels to AI-generated answers to citation metadata — means users never hit a jarring language switch mid-workflow.

Here's what the card grid and sign-in screens look like across all 8 languages:

🇯🇵 日本語 🇺🇸 English 🇰🇷 한국어 🇨🇳 简体中文
ja en ko zh-CN
🇹🇼 繁體中文 🇫🇷 Français 🇩🇪 Deutsch 🇪🇸 Español
zh-TW fr de es

Sign-in pages are also fully localized:

🇯🇵 🇺🇸 🇫🇷 🇩🇪
ja-signin en-signin fr-signin de-signin

Tips for Builders

A few things I learned the hard way that might save you time.

OpenLDAP memberOf Overlay

If you're testing with OpenLDAP, the LDAP Connector reads the memberOf attribute from user entries. Basic OpenLDAP doesn't populate this automatically — you need to add moduleload memberof and overlay memberof to slapd.conf, and create groupOfNames entries (not just posixGroup). posixGroup and groupOfNames are different structural classes and can't coexist in the same entry — use a separate OU.

The repo includes setup-openldap.sh that handles all of this automatically.

Geo Restriction Default

The WAF configuration defaults to Japan-only access (allowedCountries: ["JP"]). If you're deploying outside Japan, update this before deploying:

{ "allowedCountries": ["JP", "US", "DE", "SG"] }
Enter fullscreen mode Exit fullscreen mode

Set to [] for worldwide access.

Multiple Volumes, One Deployment

If your FSx file system has multiple volumes, specify one as the primary during CDK deployment. Additional volumes can be added as Bedrock KB data sources after deployment — each gets its own S3 Access Point and can be independently synced.


Built with Kiro

I used Kiro throughout the entire development lifecycle — specs for requirements-to-code traceability, hooks for automated validation on file saves, and steering files for project-specific rules that persist across sessions. The 8-language documentation, 130+ unit tests, 52 property-based tests, and the LDAP/ONTAP live environment verification were all developed with Kiro's assistance. As a solo developer, this level of tooling makes enterprise-quality projects feasible.


Getting Started

git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG.git
cd FSx-for-ONTAP-Agentic-Access-Aware-RAG && npm install

npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/ap-northeast-1
npx cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/us-east-1

bash demo-data/scripts/pre-deploy-setup.sh
npx cdk deploy --all --require-approval never
bash demo-data/scripts/post-deploy-setup.sh
Enter fullscreen mode Exit fullscreen mode

Prerequisites: Node.js 22+, Docker, AWS CLI configured with AdministratorAccess. Total deployment time is about 30-40 minutes (FSx for ONTAP creation takes 20-30 minutes).

The post-deploy-setup.sh script handles everything after CDK deployment: S3 Access Point creation, demo data upload, Bedrock KB data source registration + sync, DynamoDB SID data, and Cognito demo users.


What's Next

The project is at v3.4.0 and actively evolving. Some directions I'm exploring:

  • DynamoDB-driven permission master for large-scale environments (eliminating per-file .metadata.json management)
  • Multi-volume embedding across multiple FSx for ONTAP volumes with independent S3 Access Points
  • Bedrock KB Custom Data Source integration as an alternative to S3 AP

I'm looking for feedback on:

  • Permission models: Are SID/UID-GID/hybrid strategies sufficient for your use cases?
  • Authentication patterns: What IdP combinations do you need?
  • Document types: Beyond markdown, what formats need Permission-aware/Access-aware handling?

If you try it out, I'd love to hear about your experience — especially edge cases I haven't considered. PRs and issues are welcome.

👉 GitHub Repository and README.md you can switch language from 8 as well as Applicatiton UI


Yoshiki Fujiwara

Top comments (0)