Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on May 14 • Edited on May 22

FPolicy Event-Driven Pipeline, Multi-Account StackSets, and Cost Optimization — FSx for ONTAP S3 Access Points, Phase 10

#aws #amazonfsxfornetappontap #serverless #fpolicy

TL;DR

This is Phase 10 of the FSx for ONTAP S3 Access Points serverless pattern library. Building on Phase 9, Phase 10 delivers:

FPolicy event-driven integration: ONTAP FPolicy → ECS Fargate TCP server → SQS → EventBridge custom bus. The shared event-ingestion pipeline is verified end-to-end; UC-specific dispatch follows in Phase 11.
Multi-account StackSets: All 17 UC templates validated for StackSets compatibility (0 errors) + admin/execution role templates
UC-specific alarm profiles: BATCH / REALTIME / HIGH_VOLUME — three profiles with workload-appropriate thresholds
Cost optimization: Dynamic MaxConcurrency controller + business-hours scheduling (rate(1h) vs rate(6h))
E2E verification: NFSv3 ✅, NFSv4.0 ✅, NFSv4.1 ✅, SMB ✅, NFSv4.2 ❌ (unsupported by ONTAP FPolicy)

In short: Phase 9 completed the operational baseline. Phase 10 builds and verifies the shared event-ingestion pipeline that the pattern library has needed since Phase 1 — without waiting for AWS to ship native S3AP notifications. UC-specific dispatch wiring follows in Phase 11.

📊 Repository stats: 17 industry use cases + event-driven FPolicy + 6 FlexCache/FlexClone patterns | 1,499+ tests | 126 test files | Python 3.12 + CloudFormation (SAM Transform)

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns

1a. Trigger Mode Decision Guide

Before diving into the FPolicy implementation, here is the decision framework for choosing between the three trigger modes this library supports:

Mode	Choose when	Avoid when
POLLING	Hourly or batch processing is acceptable; simplest operating model	Sub-minute detection is required
EVENT_DRIVEN	Near-real-time ingestion is required and event loss during reconnect is acceptable	Compliance requires durable event capture without Persistent Store
HYBRID	You need faster detection plus periodic reconciliation to fill gaps	You want the simplest operating model

Dimension	POLLING	EVENT_DRIVEN	HYBRID
Detection latency	Minutes to hours	Seconds	Seconds + periodic catch-up
Monthly cost (infra only)	~$6-21	~$32-60	~$42-86
Operational complexity	Low	High	Highest
Event durability	High (full scan each time)	Medium (gap during restart)	High (reconciliation fills gaps)
ONTAP dependency	None (S3 AP only)	High (FPolicy config)	High

Decision flow:

Real-time detection not required → POLLING (start here for most workloads)
Real-time required + Persistent Store available (ONTAP 9.14.1+) → EVENT_DRIVEN
Real-time required + no Persistent Store → HYBRID (polling fills gaps)

Full guide: Trigger Mode Decision Guide

1. FPolicy Event-Driven Architecture

Background: why FPolicy

Every UC in this pattern library runs on a polling model: EventBridge Scheduler → Discovery Lambda → ListObjectsV2. This works, but it means latency is bounded by the polling interval (typically 1 hour). AWS still does not support GetBucketNotificationConfiguration for S3 Access Points attached to FSx for ONTAP volumes (FR-2 remains open).

ONTAP FPolicy is a file-operation notification framework built into every ONTAP system. In external server mode, it sends TCP notifications for create/write/delete/rename events to a registered server. By connecting this to AWS services, we get near-real-time event-driven processing without waiting for FR-2.

This implementation builds on Shengyu Fang's reference implementation, adapted for the 17-UC pattern library architecture.

S3 API Does Not Remove File-System Semantics

S3 Access Points for FSx for ONTAP expose file data through S3 APIs, but authorization is a two-layer model:

AWS-side authorization: IAM identity-based policy, S3 Access Point resource policy, VPC endpoint policy, SCP — all relevant policies are evaluated and all must permit the request
File-system-side authorization: The file system identity (UNIX UID or Windows domain\user) associated with the access point determines what file operations are authorized based on that user's permissions on the underlying volume

This means that least-privilege design must cover both AWS IAM and ONTAP file permissions. A common mistake is securing only the IAM layer while using a root-equivalent file system identity (UID 0), which grants full access to all files regardless of IAM restrictions.

Key behaviors:

If the file system user has read-only access, write requests through the access point are blocked — even if IAM permits s3:PutObject
Attaching an S3 access point does not change the volume's behavior when accessed via NFS or SMB
Block Public Access is always enabled and cannot be changed for FSx for ONTAP access points

For the full authorization model documentation, see S3AP Authorization Model.

Architecture

FSx ONTAP SVM (file operations: create/write/delete/rename)
│
│ TCP (port 9898, async mode)
▼
FPolicy External Server (ECS Fargate, ARM64 Python 3.12)
│
├─ [Near-real-time] → SQS Ingestion Queue
│                        │
│                        │ Event Source Mapping
│                        ▼
│                     Bridge Lambda → EventBridge Custom Bus
│                                          │
│                                   UC1 reference rule (Phase 10)
│                                          │
│                                   UC1 Step Functions
│
│                                   ── Phase 11 ──
│                                   UC-specific dispatch rules
│                                   → Step Functions / Lambda (per-UC)
│
└─ [Batch] → JSON Lines log (FSxN S3AP) → Log Query Lambda

ONTAP initiates the TCP connection to the FPolicy server — not the other way around. This means the server simply listens on a port. Because ONTAP maintains a persistent TCP control channel with keep-alive, Lambda is not viable (15-minute timeout). ECS Fargate provides the long-running TCP listener without OS management overhead.

Why not NLB?

Initial design placed an NLB in front of Fargate for IP stability. In our AWS verification, the NLB path established a TCP connection but the FPolicy handshake did not complete.

Additional verification (2026-05-14): We tested both preserve_client_ip.enabled=true and false on the NLB target group. In both configurations, ONTAP did not establish an FPolicy session through the NLB. The only connections observed from the NLB IP were health checks (TCP connect → immediate close at 10-second intervals). No FPolicy NEGO_REQ was received via the NLB path.

One plausible explanation is that ONTAP FPolicy's external-engine expects a direct TCP connection to the primary-servers IP. When the NLB forwards the connection to a Fargate task with a different IP, the FPolicy session establishment conditions are not met — possibly because ONTAP validates the connection endpoint or because the NLB's connection lifecycle (idle timeout, deregistration delay) interferes with the persistent control channel that FPolicy requires.

This remains documented as an observed deployment limitation in our environment (FSxN ONTAP 9.17.1P6, internal NLB with IP targets), not a universal NLB claim. If your environment differs, testing the NLB path is straightforward — set the NLB IP as the external-engine primary-servers and check vserver fpolicy show-engine for connection state.

Solution: Fargate task direct IP connection. IP stability is handled by an EventBridge-triggered Lambda that updates the ONTAP external-engine configuration when the Fargate task IP changes:

ECS Task State Change (RUNNING) → EventBridge Rule → IP Updater Lambda
→ ONTAP REST API: disable policy → update engine primary_servers → enable policy

The direct-IP model assumes a single active Fargate task (DesiredCount: 1) and requires network reachability from the FSxN SVM data LIFs to the task ENI on the FPolicy TCP port. This design prioritizes connection stability over horizontal scalability; multi-task active-active configurations are not supported due to FPolicy session constraints. Security groups must allow ONTAP-initiated inbound connections on port 9898. During Fargate task restarts, event handling depends on the FPolicy policy's is-mandatory setting: with is-mandatory=false (our configuration), file operations continue unblocked but notifications are dropped until the new task connects. See the Event durability note below for Persistent Store guidance.

TriggerMode parameter

Phase 10 introduces the TriggerMode parameter scaffolding and verifies the shared FPolicy → SQS → EventBridge pipeline end-to-end. A reference implementation is deployed in the legal-compliance (UC1) template. UC-specific Step Functions dispatch rules are intentionally deferred to Phase 11.

Value	Phase 10 behavior
`POLLING` (default)	Existing EventBridge Scheduler + Discovery Lambda
`EVENT_DRIVEN`	Shared FPolicy event pipeline enabled; UC-specific dispatch wiring is Phase 11
`HYBRID`	Polling remains active; event-driven deduplication path prepared for Phase 11

Default POLLING ensures zero impact on existing deployments.

NFSv3 write-complete delay

When FPolicy fires a notification, the file write may not be complete — particularly with NFSv3 which lacks close semantics. The server inserts a configurable delay (WRITE_COMPLETE_DELAY_SEC, default 5s) after receiving NOTI_REQ, and Step Functions include retry logic for incomplete files.

Event durability note

This Phase 10 implementation is designed for near-real-time processing, not end-to-end durable event capture during Fargate task restarts. With is-mandatory=false, ONTAP drops notifications when no FPolicy server is connected — file operations continue unblocked but events are lost. Environments that cannot tolerate event loss should evaluate ONTAP FPolicy Persistent Store (ONTAP 9.14.1+), available for asynchronous non-mandatory external FPolicy policies. Persistent Store queues events on the SVM during server disconnection and can replay them when the external server reconnects. Note that queue sizing, replay handling, and deduplication require application-level design. This is a Phase 11+ candidate (design-dependent).

Note (Phase 12 update): This Phase 10 article documents the initial event-durability boundary. Persistent Store replay validation is covered in Phase 12, where replay behavior was tested for 5-event and 20-event disconnect scenarios with zero event loss confirmed. Use the Deployment Profiles guide to choose the appropriate durability level for your workload.

Deployment Profiles — From PoC to Compliance

The event-driven FPolicy pattern supports three deployment profiles, each with clear boundaries for event loss tolerance and operational complexity:

Dimension	PoC/Demo	Production	Compliance-sensitive
FPolicy Server	Fargate (direct IP)	EC2 static IP or NLB	EC2 static IP + NLB
`is-mandatory`	`false`	`true` (ONTAP 9.15.1+)	`true` (ONTAP 9.15.1+)
Persistent Store	Not required	Recommended	Required (ONTAP 9.14.1+)
Retry / Dedup	Best-effort	DynamoDB idempotency	DynamoDB + S3 Object Lock lineage
Alarm Profile	Minimal (error only)	Full (latency + error + backlog)	Full + audit trail
Event Loss Tolerance	Acceptable (30-60s gap)	Near-zero (retry compensates)	Zero (Persistent Store + audit)

Key design decisions:

is-mandatory=true (ONTAP 9.15.1+): Blocks file operations when the FPolicy server is unavailable — prevents event loss but impacts availability. Use only with redundant server deployment.
Persistent Store (ONTAP 9.14.1+): Buffers events in a dedicated SVM volume during server disconnection. Events are replayed in order upon reconnection. Sizing: 1 GB ≈ 2M events at ~500 bytes each.
Replay recovery time: 100K buffered events at 100 events/sec = ~17 minutes to catch up.

The progression path is incremental: PoC → Production → Compliance-sensitive, adding capabilities at each stage without redesigning the core architecture.

Full profile documentation: Deployment Profiles

2. E2E Verification Results

Protocol support matrix

NFS Version	Mount Option	FPolicy NOTI_REQ	Result
NFSv3	`vers=3`	✅ Immediate	Works
NFSv4.0	`vers=4.0`	✅ Immediate	Works
NFSv4.1	`vers=4.1`	✅ Immediate	Works
NFSv4.2	`vers=4.2`	❌ Not sent	Unsupported
NFSv4 (auto)	`vers=4`	❌ Not sent	Negotiates to 4.2
SMB/CIFS	—	✅	Works

Key finding: mount -o vers=4 on modern Linux negotiates to NFSv4.2, which ONTAP FPolicy does not support. Always use vers=4.1 explicitly. This is documented in NetApp's FPolicy Auditing FAQ.

ONTAP version note: NFSv4.1 FPolicy monitoring support was introduced in ONTAP 9.15.1. Earlier versions support SMB, NFSv3, and NFSv4.0 only. Our test environment runs ONTAP 9.17.1P6, which includes NFSv4.1 support. See NetApp FPolicy event configuration documentation for the full protocol support matrix by ONTAP version.

Path extraction bug fix

ONTAP sends file paths in XML format within NOTI_REQ:

<PathNameType>WIN_NAME</PathNameType><PathName>\file.txt</PathName>

The initial regex extraction left residual XML tags in the file_path field. Fixed by adding an _extract_xml_value() helper with multi-tag fallback and residual tag stripping.

Before fix:

{"file_path": "<PathNameType>WIN_NAME</PathNameType><PathName>\\file.txt</PathName>"}

After fix:

{"file_path": "file.txt"}

volume_name / svm_name resolution

ONTAP's NOTI_REQ body does not always include volume and SVM names in a parseable location. Resolution strategy:

Extract from NEGO_REQ session context (SVM name available at handshake)
Fall back to environment variables (SVM_NAME, VOLUME_NAME) set in the ECS task definition

Complete E2E flow (verified)

NFSv3 file create (tee /mnt/fsxn/file.txt)
→ ONTAP FPolicy NOTI_REQ
→ Fargate FPolicy Server receives event
→ SQS SendMessage
→ Bridge Lambda → EventBridge Custom Bus

Actual EventBridge event:

{
  "detail-type": "FPolicy File Operation",
  "source": "fsxn.fpolicy",
  "detail": {
    "event_id": "2175e878-1e0c-48ef-a8b3-53664d5d5b06",
    "operation_type": "create",
    "file_path": "test-eb-e2e-1778707951.txt",
    "volume_name": "vol1",
    "svm_name": "FSxN_OnPre",
    "timestamp": "2026-05-13T21:32:37.680626+00:00",
    "client_ip": "10.0.10.67"
  }
}

3. Unified UC Directory Structure

Phase 10 introduces event-driven-fpolicy/ as a first-class shared pattern directory, using the same structure as the UC directories. It is not counted as one of the 17 industry UCs — it is a shared event-ingestion reference implementation that any UC can consume via EventBridge rules.

event-driven-fpolicy/
├── docs/                    # 8 languages (ja, en, ko, zh-CN, zh-TW, fr, de, es)
│   ├── architecture.md      # + .en.md, .ko.md, etc.
│   └── demo-guide.md
├── functions/
│   ├── ip_updater/          # Fargate IP → ONTAP REST API
│   └── sqs_to_eventbridge/  # Bridge Lambda
├── schemas/
│   └── fpolicy-event-schema.json
├── server/
│   ├── Dockerfile           # ARM64 Python 3.12
│   ├── fpolicy_server.py    # TCP listener + SQS sender
│   └── requirements.txt
├── tests/
├── README.md                # + 7 language variants
├── template.yaml            # Fargate deployment (ComputeType=fargate)
└── template-ec2.yaml        # EC2 deployment (ComputeType=ec2)

A single template.yaml with a ComputeType parameter (fargate/ec2) uses CloudFormation Conditions to select the appropriate resource set. The EC2 variant uses a t4g.micro with a static private IP — no IP update Lambda needed — at roughly ~$4/month. The Fargate variant avoids EC2 management but requires task-IP tracking and has a higher baseline cost (~$10/month for Fargate compute alone, plus VPC Endpoints). Actual cost varies by region, runtime hours, and VPC Endpoint configuration.

4. Multi-Account StackSets

StackSets compatibility validator

New validator scripts/check_stacksets_compatibility.py checks all 17 UC templates for:

Hardcoded Account IDs — 12-digit numeric strings that would break in other accounts
Resource name uniqueness — names must include !Sub with AccountId or StackName
Export name collisions — exports that would conflict across accounts
VPC/Subnet/SecurityGroup parameterization — must not be hardcoded

Result: 17/17 templates, 0 errors, 0 warnings.

StackSets role templates

Template	Purpose
`shared/cfn/stacksets-admin.yaml`	Admin account role for StackSet management
`shared/cfn/stacksets-execution.yaml`	Target account execution role (least-privilege)

The execution role uses an Organization ID condition in its trust policy — accounts outside the Organization cannot assume it. Permissions are scoped to Lambda, Step Functions, DynamoDB, S3, CloudWatch, EventBridge, SNS, and Secrets Manager only.

Automatic deployment

With AutoDeployment: Enabled on the StackSet, new accounts joining the Organization automatically receive the UC templates. No manual intervention required.

Scope note: Phase 10 validates that templates can be distributed safely across accounts via StackSets (deployment compatibility). It does not yet validate cross-account FSxN S3AP data access, shared VPC ownership, or centralized operations across accounts. Those runtime cross-account patterns are Phase 11+ work.

5. Alarm Profiles and Cost Optimization

UC-specific alarm profiles

Not all UCs have the same latency requirements. A batch genomics pipeline (UC3) tolerates higher failure rates than a real-time compliance monitor (UC12). Phase 10 introduces three profiles:

Profile	Failure Rate Threshold	Error Threshold	Target Workloads
BATCH	10%	3/hour	Periodic batch processing (UC1-5, UC9)
REALTIME	5%	1/hour	Real-time processing (UC10-14)
HIGH_VOLUME	15%	5/hour	High-volume file processing (UC6-8, UC15-17)

Each UC template now has an AlarmProfile parameter (BATCH / REALTIME / HIGH_VOLUME / CUSTOM). The CUSTOM option exposes CustomFailureThreshold and CustomErrorThreshold for fine-grained control.

Dynamic MaxConcurrency controller

shared/max_concurrency_controller.py calculates optimal Map state parallelism based on actual file volume:

def calculate_max_concurrency(
    detected_file_count: int,
    ontap_rate_limit: int = 100,
    api_calls_per_file: int = 3,
    upper_bound: int = 40
) -> int:
    optimal = min(
        detected_file_count,
        ontap_rate_limit // api_calls_per_file,
        upper_bound
    )
    return max(optimal, 1)

This replaces the static MaxConcurrency: 10 from Phase 8. For 500 files with default settings, it calculates min(500, 33, 40) = 33 — a 3.3x throughput improvement without exceeding ONTAP's rate limit.

Business-hours cost scheduling

With EnableCostScheduling=true, two EventBridge Schedulers dynamically adjust the polling frequency:

Time Period	Schedule
Business hours (weekday 09:00-18:00 JST)	`rate(1 hour)`
Off-hours (weekday 18:00-09:00 + weekends)	`rate(6 hours)`

BusinessHoursStart and BusinessHoursEnd parameters allow customization. The Cost Scheduler emits an EstimatedMonthlySavings CloudWatch metric for visibility.

S3 Access Points Performance Considerations

Key performance characteristics (from AWS documentation):

Latency: Tens of milliseconds (consistent with S3 bucket access)
Throughput: Depends on the FSx file system's provisioned throughput capacity — S3 AP, NFS, and SMB all share the same throughput pool
Object size limit: 5 GB for uploads (PutObject); downloads (GetObject) can be larger
Storage class: FSX_ONTAP only; SSE-FSX encryption only

Design implications for serverless pipelines:

Lambda memory → network bandwidth: Higher Lambda memory allocates more network bandwidth. For 10 MB file processing, 1,769 MB (1 vCPU) provides ~600 Mbps.
Step Functions Map concurrency: Limit MaxConcurrency based on FSx provisioned throughput. Formula: fsxn_throughput / per_lambda_throughput. Example: 512 MBps ÷ 50 MBps per Lambda ≈ 10 concurrent executions.
ListObjectsV2 pagination: MaxKeys=1000 per page. For 10,000 files = 10 pages × ~50ms = ~500ms minimum. Use Prefix filtering to reduce scope.
Shared throughput: S3 AP, NFS, and SMB all share the same FSx throughput capacity. Account for existing NFS/SMB workloads when sizing Map concurrency.
Retry strategy: Use botocore.config.Config(retries={"mode": "adaptive"}) for automatic backoff on SlowDown (503) responses.

Full analysis: S3AP Performance Considerations

6. Test Results

Category	Count	Result
Phase 10 new tests	62	All PASS ✅
Property-based tests (Hypothesis)	7 properties × 100-200 iterations	All PASS ✅
Existing tests (Phase 1-9)	982	No regressions ✅
Total	1044+	All PASS

Property-based tests

Property	What it verifies
FPolicy event round-trip	Serialize → deserialize produces equivalent object
MaxConcurrency bounds	Result always ≥ 1 and ≤ upper_bound
MaxConcurrency correctness	Result matches the min() formula
Zero files → 1	Empty input never produces 0
StackSets Account ID detection	Known violations are always caught
Cost savings non-negativity	Estimated savings ≥ 0 for all inputs
Same rate → ~0 savings	Equal business/off-hours rates produce near-zero savings

Validator results

Validator	Result
`check_s3ap_iam_patterns.py`	17/17 clean ✅
`check_handler_names.py`	87 handlers, 0 issues ✅
`check_conditional_refs.py`	17 templates, 0 issues ✅
`check_stacksets_compatibility.py`	17 templates, 0 errors ✅
`_check_sensitive_leaks.py`	160 images, 0 leaks ✅
cfn-guard IAM security	Advisory, 0 new violations ✅

7. Deployment Learnings

Several issues surfaced during AWS verification that are worth documenting:

Issue	Root Cause	Fix
NLB path: FPolicy handshake fails	ONTAP FPolicy expects direct TCP to primary-servers IP; NLB target routing does not satisfy session establishment (tested with preserve_client_ip true and false)	Direct Fargate IP + EventBridge IP auto-update
jsonschema 4.18+ fails on ARM64 Lambda	rpds-py native dependency	Pin to 4.17.x
SCHEMA_PATH differs between Lambda and local	Different working directories	Fallback path resolution
Guard Hook rejects Condition-based `Resource: "*"`	Overly strict rule	Updated rule to allow `Condition exists`
ECR pull fails in private subnet	Missing VPC Endpoints	Added ECR, STS, S3, Logs, SQS endpoints
KEEP_ALIVE timeout race	Server timeout = keep_alive_interval	Increased to 300s
NFSv4 events not firing	`vers=4` negotiates to unsupported 4.2	Explicit `vers=4.1`

7a. Beyond AI/ML — Enterprise Workload Examples

This pattern is not limited to AI/ML demos. The S3 Access Points architecture applies to any enterprise file data on FSx for ONTAP:

SAP peripheral files and exported business documents — IDoc exports, ABAP report outputs, BW data extracts. Process without changing SAP file interfaces.
EDI / HULFT landing zones — Automatic validation and format conversion of received files. No changes to existing EDI/HULFT infrastructure.
Audit evidence and compliance reports — Periodic integrity checks, retention management, with NTFS permissions preserved.
Batch output from EC2-based business applications — Add serverless post-processing pipelines without changing application output paths.
Scanned documents and regulated records — OCR, classification, PII detection on documents stored for long-term retention.

The design principle: File data stays on FSx for ONTAP. S3 Access Points provide the bridge to AWS-native automation, AI/ML, and analytics services — without data movement, without changing existing NFS/SMB access patterns. Existing backup (SnapMirror), DR, and access controls remain unchanged.

This positioning matters for partner and SI proposals: the value is not "replace your file server" but "connect your existing file data to AWS services without migration."

Full examples with architecture diagrams: Enterprise Workload Examples

8. Next Phase Outlook

Phase 10 established the shared event-ingestion pipeline (FPolicy → SQS → EventBridge). Phase 11 will wire those events into UC-specific processing. Candidates:

TriggerMode rollout to all 17 UCs: Expand the reference implementation from UC1 to all templates, with UC-specific EventBridge dispatch rules
FPolicy → UC-specific Step Functions dispatch: EventBridge rules matching file path prefixes/extensions to UC targets
protobuf format evaluation: ONTAP 9.15.1+ supports protobuf for higher-performance notifications
Cross-Account Observability live verification: Deploy the shared-services-observability template and validate metric aggregation
Persistent Store evaluation: Phase 11+ design-dependent work for compliance-sensitive environments that cannot tolerate event loss during Fargate task restarts
FR-2 migration path: When AWS ships native S3AP notifications, the TriggerMode parameter provides a clean migration — switch from EVENT_DRIVEN to native events without changing UC logic

Why Native S3AP Notifications Still Matter

This FPolicy-based pipeline proves that customers need event-driven processing for FSx for ONTAP S3 Access Points. However, it also quantifies where a native AWS-managed notification feature would eliminate undifferentiated heavy lifting:

Operational burden	Current (FPolicy)	With native notifications
Long-running TCP listener	Fargate 24/7 (~$30-50/month)	Not needed
Fargate task IP tracking	IP Updater Lambda + ONTAP REST API	Not needed
ONTAP external-engine reconfiguration	On every deployment	Not needed
FPolicy protocol dependency	NFSv4.2 not supported	Protocol-independent
Event durability semantics	Requires Persistent Store (ONTAP 9.14.1+)	S3-equivalent at-least-once
Cross-account event routing	SQS → Bridge Lambda → EventBridge → cross-account	Standard EventBridge rules

Implementation complexity: 15-20 CloudFormation resources and 2 Lambda functions (IP Updater + Bridge) for FPolicy, vs an estimated 3-5 resources for native EventBridge integration.

The FPolicy implementation is not a replacement for native S3AP notifications — it is evidence of customer demand and an interim event-driven pattern. The operational complexity documented here directly maps to the value a native feature would deliver.

Full analysis: Native S3AP Notifications Evidence

Partner/SI Delivery Checklist

For partners and SIs proposing this pattern to enterprise customers, a structured delivery checklist is available covering:

Customer workload classification — SAP-adjacent / file server / regulated records / AI analytics
Trigger mode selection — POLLING / EVENT_DRIVEN / HYBRID based on latency and durability requirements
Deployment profile — PoC / Production / Compliance-sensitive with clear boundaries
Access model design — IAM + S3 AP policy + ONTAP file permissions (dual-layer)
Network model — Private VPC / VPC Origin AP / Cross-Account / Shared Services
Operating model — Customer-operated / partner-operated / managed service
Success criteria — Latency, throughput, cost, auditability, recovery behavior

The checklist also includes a 4-phase PoC implementation guide (environment prep → POLLING verification → EVENT_DRIVEN verification → evaluation) and FAQ for common partner questions.

Full checklist: Partner/SI Delivery Checklist

Who should care about Phase 10?

Platform teams get an event-driven alternative to polling — near-real-time latency instead of hourly polling intervals
Security teams get StackSets compatibility validation ensuring no hardcoded account IDs leak across environments
Operations teams get workload-appropriate alarm thresholds that reduce alert fatigue
Finance teams get fewer off-hours polling invocations through business-hours scheduling, with savings surfaced as a CloudWatch metric
Storage teams get a documented FPolicy integration pattern with protocol-level verification results
Multi-account teams get ready-to-deploy StackSets admin/execution roles with Organization-scoped trust
Partners and SIs get a PoC-ready event-driven alternative for customers who cannot wait for native S3AP notifications
Regulated workload owners get a clear event-durability boundary: near-real-time by default, Persistent Store required when event loss is unacceptable
SAP / ERP teams get a pattern for connecting peripheral files (IDoc, HULFT, batch output) to AWS AI/analytics without changing existing file interfaces

Conclusion

Phase 10 solves the problem that has been deferred since Phase 1: how do you get event-driven processing from FSx for ONTAP when S3AP native notifications don't exist?

The answer is ONTAP FPolicy — a mature notification framework that predates S3 Access Points by over a decade. By connecting it to ECS Fargate → SQS → EventBridge, Phase 10 established the shared event-ingestion pipeline and the TriggerMode parameter foundation needed to support polling, event-driven, and hybrid modes. UC-specific dispatch remains the main Phase 11 focus. The default remains POLLING, so existing deployments are unaffected.

The E2E verification confirmed that NFSv3, NFSv4.0, NFSv4.1 (ONTAP 9.15.1+), and SMB all work. NFSv4.2 does not — and the most common failure mode is mount -o vers=4 silently negotiating to 4.2. This is now documented and the setup guide recommends explicit version pinning.

Beyond FPolicy, Phase 10 matures the operational model: StackSets deployment compatibility for multi-account distribution, alarm profiles for workload-appropriate monitoring, and cost scheduling for environments that don't need 24/7 polling. Combined with the 6-validator CI pipeline and 1044+ passing tests, the pattern library is ready for production-style multi-account template distribution, while runtime cross-account data-path validation remains Phase 11+ work.

Design Guides

The following design guides have been added to the repository:

Document	Description
S3AP Authorization Model	Dual-layer authorization (IAM + file system)
Deployment Profiles	PoC / Production / Compliance-sensitive
Trigger Mode Decision Guide	POLLING / EVENT_DRIVEN / HYBRID
Enterprise Workload Examples	SAP, EDI, audit, batch output
S3AP Performance	Throughput, Lambda sizing, concurrency
Native Notifications Evidence	Feature request evidence
Partner/SI Delivery Checklist	Partner/SI proposal and delivery guide

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns

Update Note

This article describes the Phase 10 baseline — the first verified shared FPolicy ingestion pipeline. The event-driven pipeline is expanded across all 17 UCs in Phase 11 and operationally hardened in Phase 12 with Persistent Store replay validation, SLO observability, capacity guardrails, and secrets rotation. Phase 13 adds FlexClone/FlexCache serverless automation.

Use the FPolicy event-driven mode for PoC and near-real-time ingestion. For regulated or compliance-sensitive workloads, evaluate Persistent Store, replay handling, deduplication, and operational runbooks before treating the pipeline as durable. See Deployment Profiles for guidance.

Previous phases: Phase 1 · Phase 7 · Phase 8 · Phase 9

Next phases: Phase 11 (UC-specific dispatch) · Phase 12 (Persistent Store replay + SLO hardening) · Phase 13 (FlexClone/FlexCache automation)

📢 Update (2026-05-23)

This article is part of the FSx for ONTAP S3 Access Points series.
The latest addition — Phase 13: From Serverless Patterns to Field-Ready Reference Architecture — is now available:

👉 Read Phase 13

Phase 13 adds FlexCache/FlexClone serverless automation, split-path S3AP monitoring, SLO runbooks, Partner/SI delivery checklist, and a complete field-ready baseline for informed evaluation.