Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on May 1 • Edited on May 21

FSx for ONTAP S3 Access Points as a Serverless Automation Boundary — AI Data Pipelines, Volume-Level SnapMirror DR, and Capacity Guardrails

#aws #s3accesspoints #automation #amazonfsxfornetappontap

TL;DR: FSx for ONTAP S3 Access Points let you treat NAS data as an S3-facing automation boundary — without moving data — making serverless AI pipelines and ops workflows practical.

This is a continuation of Building an Agentic Access-Aware RAG System with Amazon FSx for NetApp ONTAP. While the previous article focused on the RAG application itself, this one covers the operational automation layer built around FSx for ONTAP S3 Access Points.

The Shift: S3 Access Points Make Serverless NAS Automation Practical

Enterprise file data lives on FSx for NetApp ONTAP — accessed via SMB/NFS by users and applications every day. Automating operations around that data has traditionally meant mounting NFS from compute instances, managing file system connections, and dealing with the cold-start and connection-limit penalties that come with VPC-mounted Lambda functions.

FSx for ONTAP S3 Access Points change this equation. They expose ONTAP file data through supported S3 object APIs — GetObject, PutObject, ListObjectsV2, and others — while keeping the data in FSx and preserving concurrent SMB/NFS access. The practical shift is that Lambda no longer needs mounted NAS access for the data path. S3 Access Points provide the supported object-facing operations, and ONTAP REST API supplies the storage-system metadata that S3 cannot expose.

This is the architectural pivot that makes the automation suite in this article possible:

Serverless file inventory without NFS/SMB mounts from Lambda
AI / RAG preprocessing directly against ONTAP file data through supported S3 object APIs
Metadata sidecar generation by combining S3-listed objects with ONTAP ACL, export policy, and security-style metadata via REST API
Scheduled governance scans over NAS data using IAM + file-system authorization
Reuse of S3-speaking application components (Bedrock KB, analytics tools) without moving data out of ONTAP

ONTAP REST API complements this by providing the control-plane and storage-system context — volume management, SnapMirror orchestration, capacity monitoring, snapshot operations — that S3 Access Points are not designed to handle.

👉 Code: automation/fsxn-ops/

S3 Access Point Compatibility Model

Update: The repository now includes a dedicated S3AP Authorization Model document. When using FSx for ONTAP S3 Access Points, authorization is evaluated both by AWS IAM/S3 access point policies and by the file-system identity associated with the access point. This dual authorization model should be reviewed before using the patterns with sensitive or regulated data.

FSx for ONTAP S3 Access Points support a subset of S3 APIs, not full S3 bucket semantics. Understanding this boundary is essential for building reliable automation.

Supported operations used in this automation:

ListObjectsV2 — file inventory and scanning
GetObject — file content access for preprocessing
PutObject — writing metadata sidecars and manifests

Notable unsupported operations:

GetBucketNotificationConfiguration — this is why the design uses scheduled polling instead of S3 notification-driven triggers
GetObjectAcl / PutObjectAcl — object ACL management is not available
Presign — presigned URL generation is not supported
Bucket-style management APIs and bucket-notification semantics do not apply in the same way here

Authorization operates at two layers: IAM permissions control AWS-level access, while file-system-level authorization uses the mapped user identity (UNIX or Windows) configured on the S3 Access Point. File-level ACLs are retrieved via the ONTAP REST API; S3 APIs do not expose file-system ACLs.

Architecture

The architecture has four layers, with S3 Access Points as the data-path boundary:

┌─────────────────────────────────────────────────────────┐
│  Orchestration: EventBridge Scheduler / Step Functions   │
├─────────────────────────────────────────────────────────┤
│  Compute: Lambda (Python 3.12, VPC-deployed)            │
├──────────────────────┬──────────────────────────────────┤
│  Data Path:          │  Control Plane:                  │
│  FSx ONTAP S3 AP     │  ONTAP REST API                  │
│  (ListObjectsV2,     │  (volumes, SnapMirror,           │
│   GetObject,         │   snapshots, exports,            │
│   PutObject)         │   security_style, ACLs)          │
├──────────────────────┴──────────────────────────────────┤
│  Storage: FSx for NetApp ONTAP (SMB/NFS + S3 AP)       │
└─────────────────────────────────────────────────────────┘

S3 Access Points = data-path (file listing, content access, sidecar writes)
ONTAP REST API = storage metadata + control-plane (resize, SnapMirror, snapshots, ACLs, export policies)
Step Functions / Lambda = orchestration and compute
EventBridge = scheduled execution (polling, not file-change triggers)

Implementation

1. Data Preprocessor: The S3 Access Point Workflow

This is the core use case that S3 Access Points enable. The preprocessor scans FSx ONTAP volumes through S3 Access Points and enriches the results with ONTAP-specific storage metadata.

class DataPreprocessor:
    def list_source_objects(self, prefix="", suffix_filter=None):
        """List files on FSx ONTAP via S3 Access Point (ListObjectsV2)"""
        response = self.s3.list_objects_v2(
            Bucket=self.s3_access_point_arn,  # FSx ONTAP S3 AP ARN
            Prefix=prefix,
        )
        # Filter by extension, collect basic object metadata
        # (key, size, last_modified, etag)
        ...

    def collect_ontap_metadata(self, volume_name):
        """Get storage-system metadata via ONTAP REST API"""
        vol_detail = self.ontap.get_volume(vol_uuid)
        return {
            "security_style": vol_detail["nas"]["security_style"],
            "export_policy": vol_detail["nas"]["export_policy"]["name"],
            "snapshot_count": len(self.ontap.list_snapshots(vol_uuid)),
        }

S3 Access Points give you data-path operations and basic object-facing metadata (key, size, last modified, ETag). ONTAP REST API provides storage-system metadata and control-plane attributes — security style, export policies, snapshot information, and NAS/storage context that S3 APIs cannot expose.

The preprocessor combines both to generate task manifests for downstream AI/analytics pipelines:

S3 AP: ListObjectsV2 → file inventory (.md, .pdf, .docx)
ONTAP REST: GET /storage/volumes → security_style, export_policy, snapshots
  → Generate preprocessing tasks (batch_size=10)
  → Write manifest to S3 (PutObject)
  → Downstream: Bedrock KB Ingestion, analytics, governance

2. ONTAP REST API Client

The shared Python client handles control-plane operations. Credentials come from Secrets Manager.

class OntapClient:
    def __init__(
        self,
        management_lif: str,
        secret_id: str,
        verify_ssl: bool = True,       # Default: TLS verification enabled
        ca_cert_path: str = None,       # CA bundle for production
    ):
        if not verify_ssl:
            logger.warning("TLS verification disabled — lab/PoC only")

TLS verification is enabled by default. FSx for ONTAP management LIFs use self-signed certificates, so production deployments should provide a CA bundle via ca_cert_path. For lab/PoC environments, verify_ssl=False can be set explicitly — the client logs a warning. The ONTAP_VERIFY_SSL and ONTAP_CA_CERT_PATH environment variables control this at the Lambda level.

3. Capacity Monitor with Guardrails

Runs every 5 minutes via EventBridge Scheduler. Checks filesystem-level capacity (FSx API + CloudWatch) and volume-level usage (ONTAP REST API).

Guardrail	Default	Purpose
`DRY_RUN`	`true`	Safe default — logs actions without executing
`MAX_GROW_PER_ACTION_PCT`	50%	Prevents a single run from doubling a volume
`MAX_GROW_PER_DAY_GIB`	500 GiB	Caps total daily expansion
`VOL_THRESHOLD_PCT`	80%	Aligned with AWS recommendation to keep SSD utilization below 80%

Observed behavior: CloudWatch StorageCapacityUtilization metrics are not always available for new filesystems or those with minimal data. The monitor falls back to ONTAP REST API for volume-level monitoring when CloudWatch data is unavailable.

4. Volume-Level SnapMirror Failover Orchestration

Scope: This automation handles planned failover for volume-level SnapMirror relationships — breaking replication, recreating selected CIFS shares and NFS exports on the DR side, and reversing the process for failback. It is not a complete SVM-DR solution. ONTAP's SVM-DR includes additional considerations such as identity preservation and replicated configuration scope. See NetApp's SVM-DR documentation for the full picture.

The Step Functions state machine orchestrates 10 actions through a single Lambda function with action routing.

SnapMirror Initialization: Two Paths

ONTAP supports documented initialization semantics during relationship creation in supported create flows. The transfers API starts an initialize or update operation depending on the current relationship state.

The automation implements both:

initialize action: Creates the relationship with "state": "snapmirrored" in supported create flows. In workflows using a pre-existing destination volume, explicit transfer may be more reliable.
final_transfer action: Explicit POST /snapmirror/relationships/{uuid}/transfers — starts an initialize or update transfer depending on the current state.

Observed in my tested environment (ONTAP 9.17.1P4D3, FSx SINGLE_AZ_1 with pre-existing destination volume): The create-with-state path resulted in a job failure, so the automation fell back to explicit transfer. Both paths are implemented and tested.

Enabling the S3 Access Point Pattern: Private-Subnet Networking

If you want S3 Access Point-driven serverless automation for ONTAP data in a private-subnet design (no NAT Gateway), this is the endpoint footprint you need:

Service	Type	Why
`secretsmanager`	Interface	ONTAP credentials
`fsx`	Interface	FSx API (describe, update)
`monitoring`	Interface	CloudWatch metrics
`sns`	Interface	SNS notifications
`s3`	Gateway	S3 Access Point data-path — must be associated with Lambda subnet's route table (no hourly endpoint charge)

This applies to private-subnet / no-NAT deployments. If your Lambda functions have another egress path, the endpoint requirements differ. The S3 Gateway endpoint specifically needs to be associated with the route table used by the Lambda subnet.

What I Learned from AWS Verification

Deployed and tested against FSx for ONTAP (ONTAP 9.17.1P4D3). The following are empirical observations from my tested deployment pattern.

SNS VPC endpoint is required for alerts — without it, SNS Publish silently times out in private-subnet Lambda. This is documented AWS behavior for VPC-deployed Lambda, but easy to overlook.
fsxadmin password sync is not automatic — Secrets Manager and FSx ONTAP store the password independently. If someone changes it via the console, Lambda gets 401 errors.
S3 Gateway endpoint route table association matters — it must be the specific route table used by the Lambda subnet, not just any route table in the VPC.

Same-SVM SnapMirror on SINGLE_AZ_1: Test Harness Only

⚠️ This is a low-cost automation test harness, not a real DR architecture. Source and destination volumes remain in the same failure domain (same filesystem, same AZ). Use this only for validating automation logic.

I tested SnapMirror within the same SVM on a SINGLE_AZ_1 deployment. It works for validating the automation without the cost of a second filesystem (~$200/month minimum).

Cost

Component	Monthly
Lambda (4 functions)	~$1.65
Step Functions	~$0.05
EventBridge Scheduler	~$0.00
Secrets Manager	~$0.40
CloudWatch Logs	~$0.50
Serverless subtotal	~$2.60
VPC Interface Endpoints (4 × ~$7.30/AZ)	~$29-58
S3 Gateway Endpoint	$0.00
Total (with dedicated endpoints)	~$32-61

If your VPC already has these endpoints, the incremental cost is the serverless subtotal only. The S3 Gateway endpoint has no hourly endpoint charge, so the dominant networking cost comes from the four interface endpoints.

Extending the S3 Access Point Pattern

Automated Permission Metadata Pipeline

The strongest extension connects S3 Access Points to the RAG system's permission pipeline:

EventBridge (daily)
  → S3 AP: ListObjectsV2 → file inventory
  → ONTAP REST: GET ACL metadata per file
  → Generate .metadata.json with allowed_group_sids
  → S3 AP: PutObject → write sidecars alongside source files
  → Trigger Bedrock KB Ingestion Job

This eliminates manual .metadata.json management — the automation reads NTFS ACLs from ONTAP REST API and generates permission metadata automatically, writing it back through the S3 Access Point.

Multi-Volume Ingestion Orchestration

For environments with multiple FSx ONTAP volumes, each with its own S3 Access Point:

Step Functions Map:
  → Per volume: S3 AP scan → ONTAP metadata → generate sidecars
  → Per volume: Trigger Bedrock KB Ingestion Job
  → Wait for all → validate vector counts → notify

ONTAP Operations Chatbot

Combine ontap_api_executor with Bedrock Agent for natural language ONTAP management. The security controls (method restrictions, blocked paths) make this safe for read-only chatbots.

Testing

38 unit tests covering TLS verification modes, SnapMirror dual initialization paths, capacity guardrails, and S3 AP operations:

pip install -r automation/fsxn-ops/requirements.txt
pytest automation/fsxn-ops/tests/ -v

# AWS integration tests (auto-deploys, tests, cleans up)
bash automation/fsxn-ops/tests/integration/run_aws_verification.sh

Wrapping Up

FSx for ONTAP S3 Access Points are the architectural enabler that makes this automation suite practical:

AI / data pipelines — ONTAP file data becomes accessible to serverless workflows through S3 Access Points (supported S3 object APIs), enriched with ONTAP REST API storage-system metadata
Management — ONTAP REST API handles control-plane automation (volumes, snapshots, exports)
Capacity — monitored with guardrails and safe expansion defaults
Volume-level DR — planned failover / failback for volume-level SnapMirror relationships (not full SVM-DR)

The serverless compute cost is ~$2.60/month. The code is open source and deploys with a single CloudFormation command.

👉 GitHub: automation/fsxn-ops/
📖 Full project: Yoshiki0705/FSx-for-ONTAP-Agentic-Access-Aware-RAG

Yoshiki Fujiwara