Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on Jun 21

Amazon Bedrock Knowledge Bases + FSx for ONTAP S3 Access Points: Self-Service AI Curation via Windows Drag & Drop — Phase 16

#aws #amazonbedrock #amazonfsxfornetappontap #s3accesspoints

TL;DR

"Put files in this Windows folder and your AI assistant can use them after the next governed sync" — UC29 reduces the handoff friction around enterprise AI knowledge updates. Business users maintain Amazon Bedrock Knowledge Base content through Windows Explorer drag & drop on an FSx for ONTAP SMB share. No S3 console, no ETL, no copy. Data remains in the file system as the operational source; the S3 Access Point provides a governed access path for ingestion without creating a separate object copy.

Three maturity stages + one opt-in enhancement:

Scenario A (manual): User places files → triggers sync from console/CLI
Scenario B (scheduled): EventBridge Scheduler + Step Functions poll every 15 minutes
Scenario C (event-driven): FPolicy detects file placement → real-time KB sync
Hybrid RAG (opt-in): Internal KB answers augmented with real-time web search via AgentCore Web Search Tool (GA June 2026)

Vector store: Amazon S3 Vectors — a managed, cost-aware vector store option for Bedrock Knowledge Bases, with metadata filtering available for permission-aware retrieval designs.

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns (see solutions/genai/kb-selfservice-curation/ and samconfig.toml.example for deployment parameters)

Why This Pattern?

Challenge	Traditional approach	This pattern
Knowledge updates blocked on IT	Ticket → manual ETL	Business user drags & drops
Dual management (NAS + S3 copy)	Source drifts from S3 replica	S3 AP reads the single source directly
Forgotten re-ingestion	Manual and easy to forget	Automatic (Scenario B/C)
Specialist skills required	ETL / S3 / Bedrock expertise	Familiar Windows folder operations
Answers limited to internal docs	No current external context	Hybrid RAG: internal + web search (opt-in)
Vector store cost	OpenSearch Serverless ~$175/month minimum	S3 Vectors: cost-aware pay-per-use profile

Architecture

Users drag files into an AI-dedicated NTFS volume (SMB share, ACL-separated by department) on FSx for ONTAP. The same volume is exposed via S3 Access Point as a Bedrock Knowledge Base data source:

Users see a familiar Windows folder structure — each department has its own folder with NTFS ACL separation.

Drag & drop a product spec into the sales folder — that is the only content-maintenance step for the user.

Windows Explorer (drag & drop)
  → FSx for ONTAP volume (ai_knowledge/)
  → S3 Access Point (read-only, internet-origin)
  → Bedrock Knowledge Base Data Source (inclusionPrefixes: ai-knowledge/)
  → StartIngestionJob → S3 Vectors (vector store) updated

Seven business roles (sales / marketing / finance / IT / operations / legal / developers) share the volume with NTFS ACL-based permission separation.

Design note: S3 Access Points for FSx for ONTAP let S3-compatible AWS services access file data without copying it to an S3 bucket. Access is authorized twice: first through AWS/IAM policies on the access point path, and then through the file-system identity and permissions associated with the access point.

Hybrid RAG Flow (opt-in)

When EnableWebSearch=true, the Query Lambda augments internal KB answers with real-time web information:

User question
  ├─→ [1] Bedrock KB RetrieveAndGenerate (internal docs via S3 AP → S3 Vectors)
  ├─→ [2] AgentCore Web Search (us-east-1, cross-region MCP call)
  │       Amazon-operated web index (tens of billions of docs, continuously updated)
  └─→ [3] Bedrock Converse → unified answer with dual citations
            [Internal: product-spec.pdf] + [Web: Market Report 2026](https://...)

Scenario B: Scheduled Automation (Step Functions)

The most common production deployment. EventBridge Scheduler triggers a Step Functions workflow every 15 minutes:

EventBridge Scheduler (rate(15 minutes))
  └─→ Step Functions State Machine
       ├─→ DetectAndStartIngestion Lambda
       │     • ListObjectsV2 via S3 AP → compare with last-known state
       │     • If changes detected → StartIngestionJob
       │     • If no changes → skip (cost-zero run)
       ├─→ Wait (30s) → CheckIngestionStatus Lambda (poll loop)
       └─→ NotifySuccess / NotifyFailure → SNS

Key design choices:

Differential detection: Compares current S3 AP listing against prior state; only triggers ingestion when files actually changed
Idempotent: Re-running the same schedule with no file changes is a no-op (no wasted Bedrock ingestion cost)
Observable: Each Step Functions execution is visible in the console with per-state timing and error details
Configurable interval: The ScheduleExpression parameter accepts any EventBridge rate/cron expression

This is the required safety net for Scenario C (see below) — it catches any files missed during the lost-update window.

Scenario C: FPolicy Event-Driven Real-Time Sync

Scenario B's 15-minute polling is replaced by FPolicy real-time detection:

Windows/NFS file operation
  → FPolicy instant detection (CREATE/WRITE/DELETE/RENAME)
  → FPolicy Server → SQS → Bridge Lambda → EventBridge custom bus
  → EventBridge Rule (file_path prefix = ai_knowledge)
  → KB Trigger Lambda (debounce) → StartIngestionJob
  → Bedrock KB → reflected in tens of seconds to minutes

The FPolicy → SQS → EventBridge front-end reuses the existing solutions/event-driven/fpolicy pattern infrastructure. UC29 adds only an EventBridge rule and the KB Trigger Lambda.

Lost-Update Window (Critical)

The EventBridge rule routes only FPolicy events matching the ai_knowledge volume path to the KB Trigger Lambda.

Bedrock Ingestion performs a full-source scan at job start time. Files added during a running job are not included in that execution. Scenario C alone does not guarantee zero missed files.

Mandatory: Always pair Scenario C with Scenario B (periodic reconcile sync) as a safety net. The KB Trigger Lambda skips when a job is already in progress (debounce + ConflictException handling + reserved concurrency = 2).

Namespace Pitfall

FPolicy reports ONTAP volume-path namespace (ai_knowledge/..., underscore). The KB S3 ingestion prefix (ai-knowledge/, hyphen) is a different namespace. Initial implementation confused the two, causing false-skip. The EventBridge rule and Lambda secondary filter now use a dedicated FPOLICY_PATH_FILTER parameter for the volume-path namespace.

Hybrid RAG: Internal KB + Web Search (opt-in)

GA at AWS Summit NYC 2026 (June 17, 2026). Powered by AgentCore Web Search Tool.

Enterprise knowledge from FSx for ONTAP is treated as the primary internal source, while public web context is supplemental and untrusted. For questions that benefit from current external context — regulatory updates, market trends, public market information — the Query Lambda can optionally augment answers with real-time web search results.

How It Works

Internal KB retrieval (always): Bedrock KB searches S3 Vectors for relevant chunks from FSx for ONTAP documents
Web search (opt-in): AgentCore Gateway invokes Amazon's purpose-built web index via MCP protocol
Unified answer: Bedrock Converse merges both contexts, with internal documents as primary source

Key Design Decisions

Decision	Rationale
Opt-in (`EnableWebSearch=false` default)	Most enterprise QA needs internal data only
Graceful degradation	Web Search failure → internal-only answer (no error surfaced to user)
Cross-region (us-east-1 Gateway)	Web Search Tool is us-east-1 only; adds ~100-200ms latency
Query safety	Only user's question text is sent to Web Search — never internal document content
Citation separation	`[Internal: filename]` vs `[Web: title](URL)` — users see exactly which source informed each claim
Prompt injection defense	Web results wrapped in `<web_search_results>` with explicit "untrusted data" instruction
Acceptable Use compliance	Source URLs and titles are always displayed (Web Search Tool TOS requirement)

Deployment

sam deploy --parameter-overrides \
  EnableWebSearch=true \
  AgentCoreGatewayId=<gateway-id> \
  AgentCoreGatewayRegion=us-east-1

Example Response

{
  "status": "completed",
  "query": "What are the latest FISC guidelines for cloud data protection?",
  "answer": "Based on internal documentation, our current FISC compliance posture covers... Additionally, [Web: FISC 2026 Guidelines Update](https://example.com/fisc-2026) published last month introduces...",
  "citations": [{"source": "s3://.../legal/compliance/fisc-overview.pdf"}],
  "web_citations": [{"source": "https://example.com/fisc-2026", "title": "FISC 2026 Guidelines Update", "type": "web"}],
  "web_search_enabled": true
}

Verification Highlights

Windows-Identity S3 Access Point with Dedicated AD

The Windows EC2 domain-joined to AWS Managed Microsoft AD, with the FSx SMB share mapped — proving the literal drag & drop experience works end to end.

To demonstrate the literal Windows drag & drop experience, we built a dedicated AWS Managed Microsoft AD + domain-joined Windows EC2 + AD-joined SVM:

AD-joined SVM OU: AWS Managed AD's OU=Computers lacks delegation rights → use the domain-name OU (OU=<domain>,DC=...)
CIFS share creation: Executes against the filesystem management LIF, not the SVM LIF
Windows-identity S3 AP: Works correctly with a running dedicated AD; files dropped in Explorer are readable via S3 AP

Deletion Lifecycle

The Bedrock KB data source connected to the FSx for ONTAP S3 AP alias. Click "Sync" for manual ingestion, or let Scenario B/C automate it.

Scenario B's Step Functions workflow: detect changes → start ingestion → poll status → notify on completion.

"User deletes a file → AI forgets it" verified end-to-end: file deletion → next sync → numberOfDocumentsDeleted=1 → re-query returns "no information found". Powered by dataDeletionPolicy=DELETE. For urgent revocation between syncs, call the Ingestion API directly.

Performance Considerations

Shared bandwidth: S3 AP reads share the FSx throughput capacity (128/256/512 MBps) with NFS/SMB workloads. Scenario B's 15-minute interval and Scenario C's reserved concurrency (2) throttle ingestion flow
Bulk re-index: For full re-ingestion (e.g., embedding model change), use a FlexClone volume as the Ingestion target — zero impact on production I/O, consistent point-in-time read
Tiering: Frequently accessed AI knowledge should remain on the SSD tier. Capacity Pool retrieval latency affects GetObject time during ingestion
Web Search latency: Cross-region call to us-east-1 adds ~100-200ms. Total hybrid query latency depends on KB size, model, and network conditions (KB retrieve + Web Search + Converse generation)

Access Control — Three Layers

S3 AP boundaries are volume/prefix-level. For per-user visibility:

Search narrowing = Bedrock KB metadata filters (this UC; not AWS authorization)
Document-level ACL = Amazon Quick S3 Knowledge Base (UC30; user/group-level)
Chunk-level permission filter = Custom Permission-Aware RAG (FC3; AD SID/NTFS ACL for regulated industries)

Web Search results are public information — no ACL filtering needed. However, the unified answer that combines internal + web sources is subject to the same access control as internal-only answers (the internal citations remain permission-scoped).

Vector Store: Why S3 Vectors

This pattern uses Amazon S3 Vectors as the Bedrock KB vector store. OpenSearch Serverless remains a valid option when its operational and latency profile fits the workload better.

Criterion	OpenSearch Serverless	S3 Vectors
Minimum monthly cost	~$175 (2 OCU)	Pay-per-use only
Cost at scale	OCU-based	Cost savings for large vector datasets (see AWS documentation)
Metadata filtering	Supported	Supported (department, owner, role)
Permission-Aware RAG compatibility	Supported	Compatible with metadata-filtered retrieval designs; authorization enforced by application layer
Infrastructure management	Managed but OCU scaling required	Managed vector operations
Scale	Millions of vectors	2 billion vectors per index
Query latency	Sub-100ms	Sub-100ms

For this project — 28 industry patterns + PoC-to-production lifecycle — S3 Vectors' pay-per-use model is the right fit. We evaluated Bedrock Managed Knowledge Base (GA June 2026, AWS Summit NYC) but chose Custom KB + S3 Vectors for cost control, ACL metadata flexibility, and FSx for ONTAP lifecycle integration (see ADR: docs/investigations/managed-kb-vs-custom-kb-s3vectors.md).

Data Classification

Output	Classification	Rationale
KB vectors + metadata	INTERNAL	Inherits source file classification
Ingestion job status / SNS	INTERNAL	Operational metadata only
CloudWatch Metrics / Logs	INTERNAL	Aggregate metrics, no file content
Web Search results	PUBLIC	External public information
Hybrid answer (internal + web)	INTERNAL	Contains internal document citations

For regulated workloads (CUI / FISC / HIPAA), extend shared/data_classification.py labels. If retention-period requirements apply, use dataDeletionPolicy=RETAIN and design a separate purge procedure.

Cost

Component	Monthly estimate	Notes
Lambda (sync + query)	< $5	Serverless pay-per-use
S3 API (ListObjects, GetObject)	< $1	S3 AP reads
EventBridge Scheduler	< $1	15-min interval
Bedrock KB Ingestion	Usage-based	Per-document embedding
S3 Vectors	Usage-based	Compare with OpenSearch Serverless for your query volume, latency, and operations requirements
Bedrock LLM (query)	Usage-based	Nova Pro: $0.0008/1K input tokens
FPolicy Server (Scenario C)	~$35	ECS Fargate (set desiredCount=0 when idle)
AgentCore Web Search (opt-in)	Per-query pricing (see AgentCore pricing)	Gateway invocation pricing
Cross-region transfer (opt-in)	< $0.02	us-east-1 ↔ ap-northeast-1

Getting Started

git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns.git
cd FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns/solutions/genai/kb-selfservice-curation

# Install dependencies (shared modules used by Lambda handlers)
pip install -r requirements.txt        # or: uv pip install -r requirements.txt

# Review parameters
cat samconfig.toml.example

# Build and deploy (requires configured AWS credentials + FSx for ONTAP S3 AP)
sam build && sam deploy --guided

# DemoMode=true runs without FSx for ONTAP (regular S3 bucket)

# Optional: Enable Web Search hybrid RAG
sam deploy --parameter-overrides \
  EnableWebSearch=true \
  AgentCoreGatewayId=<gateway-id> \
  AgentCoreGatewayRegion=us-east-1

Governance Note

This article is technical architecture guidance, not legal, compliance, or regulatory advice. Pricing, regional availability, and benchmark numbers are time-sensitive; verify them against current AWS documentation before production use. S3 AP data source boundaries are at volume/prefix granularity — for per-user visibility control, consider Custom Permission-Aware RAG. If retention-period requirements (NARA / FISC) apply, use dataDeletionPolicy=RETAIN and design purge procedures separately. Web Search Tool usage requires compliance with the Acceptable Use Policy (source citations must be displayed).

Yoshiki Fujiwara

DEV Community