DEV Community

Cover image for FSx for ONTAP S3 Access Points Lakehouse — What Works, What Doesn't, and Why

FSx for ONTAP S3 Access Points Lakehouse — What Works, What Doesn't, and Why

TL;DR

Amazon FSx for ONTAP S3 Access Points let you access NAS file data through S3-compatible APIs — without first copying source files to S3.

I tested multiple analytics, AI/ML, and lakehouse access patterns across AWS-native services, open-source engines, and third-party platforms. The results fall into four categories:

Verified in this series ✅ Candidate (AWS-documented) 🔎 Partially resolved, not production-ready ⚠️ Not suitable for this path ❌
Athena, Glue, EMR Spark, Redshift Spectrum, DuckDB Lambda, Trino, Snowflake (with AWS_ACCESS_POINT_ARN) Bedrock KB, Lake Formation, Quick Databricks UC (session policy partially resolved; UC table creation and directory listing still blocked) Delta / Iceberg / Hudi transactional write paths

The pattern: Read-oriented analytics and flat-file writes (such as Parquet append) worked reliably in my validation environment. Transactional table-format write paths failed in this validation because they require commit semantics (atomic rename, conditional metadata update) that were not satisfied through the FSx S3 AP path.

GitHub Repository: fsxn-lakehouse-integrations


Validation Vocabulary

Term Meaning
Verified Worked in my test environment with evidence in verification-pack/
Candidate AWS-documented or related-series path that still requires workload-specific validation
Blocked Failed due to integration-layer behavior observed in validation
Not suitable Failed because required table-format semantics were unavailable or incompatible

When this article says "Verified," it means the behavior was observed in my test environment and evidence is available. It does not mean production certification or vendor support guarantee.


Why This Matters

Enterprise organizations store petabytes of file data on NAS (NFS/SMB). To analyze this data with modern tools, they typically:

  1. Copy data from NAS to S3 (ETL pipeline)
  2. Register in a catalog (Glue, Unity Catalog)
  3. Query with analytics platform

FSx for ONTAP S3 Access Points eliminate step 1. The same files accessible via NFS/SMB are now queryable via S3 API — zero source-file movement, zero sync pipeline, zero duplicate storage.

Before: NFS/SMB → [ETL Copy] → S3 → Analytics Platform
After:  NFS/SMB ←→ FSx for ONTAP ←→ S3 Access Point → Analytics Platform
                    (same data, same volume)
Enter fullscreen mode Exit fullscreen mode

Note for regulated workloads: "Zero data movement" means source files do not need to be copied from FSx for ONTAP to S3 for the tested access paths. However, metadata, query results, logs, embeddings, temporary files, and derived datasets may still be created by the consuming service. See Note for Regulated Workloads below.


Security Model

Every request to FSx for ONTAP S3 Access Points must pass two authorization layers (AWS documentation):

  1. S3-side authorization: IAM identity policy, S3 Access Point policy, VPC endpoint policy (if applicable), SCP
  2. FSx-side authorization: Associated UNIX or Windows file system user permissions on the underlying volume

Both layers must permit the request. A permissive IAM policy does not override restrictive file system permissions, and vice versa.


The Compatibility Map

Verified (Evidence in verification-pack)

Platform Pattern Benchmark Cost/Query
Athena Serverless SQL via Glue Catalog 54.8 MB/s (5M rows in 2.2s) ~$0.0005
DuckDB Lambda In-process analytics (arm64) 10K rows in 452ms (warm) ~$0.00001
EMR Spark Distributed Spark SQL 10K rows read+write in 16s ~$0.001
Redshift Spectrum DWH + external data JOIN 5M rows in 4.3s ~$0.005
Trino Open-source distributed SQL 5M rows in 1.5s Compute cost only
Glue ETL PySpark medallion pipeline 10K rows transform in 64s ~$0.02

Candidate (AWS-documented, requires workload validation)

Platform Pattern Notes
Lake Formation Governance overlay Table/column-level access behavior observed; production workload validation needed
Bedrock KB RAG document ingestion Per AWS tutorial; permission-aware retrieval requires separate validation

Blocked in Validation (Third-Party Platforms)

Platform Symptom Root Cause Workaround
Databricks (Unity Catalog) Subdirectory ls → AccessDenied; CREATE TABLE → fails Session policy partially resolved with access_point field; prefix-level listing and UC table creation still blocked Explicit-path spark.read works but without UC table registration, governance features (lineage, tags, fine-grained access) cannot be applied; Instance Profile + boto3 for full access (bypasses UC entirely)
Snowflake (External Stage) ✅ Works with AWS_ACCESS_POINT_ARN AWS_ACCESS_POINT_ARN stage parameter resolves session policy for GetObject SELECT, External Table, LIST all work; Iceberg write-back TBD

Databricks update (2026-05-24): Setting the access_point field on the UC External Location partially resolves the session policy issue. Top-level dbutils.fs.ls, dbutils.fs.head, and spark.read with explicit file paths now succeed. However, UC table creation (CREATE TABLE LOCATION) fails, subdirectory listing is blocked, and write operations are denied. Without UC table registration, Unity Catalog governance features — lineage tracking, fine-grained access control, governance tags, and audit — cannot be applied to the data. This means the data is technically readable but not governable through UC. Support case active — awaiting guidance on table creation and prefix-level access.

Support cases filed with both vendors.

Not Suitable for This Path (Table Format Constraints)

Format Write Operation Why It Failed in Validation
Delta Lake INSERT/MERGE/VACUUM Tested engines required atomic rename for _delta_log/ commit; not available via S3 API
Apache Iceberg CREATE TABLE/INSERT S3FileIO could not handle AP alias for metadata write/verify in tested configuration
Apache Hudi Upsert/Compaction Same atomic rename requirement as Delta (timeline commit)

In this validation, transactional table writes failed because the tested engines required commit semantics (atomic rename, conditional metadata update, or table-log behaviors) that were not satisfied through the FSx S3 AP path. See API support documentation.

What DOES work for writes: Flat Parquet/CSV append via PutObject (Athena CTAS, Glue ETL write-back, EMR Spark write, DuckDB COPY TO).


Benchmark Methodology

All benchmark numbers should be read with the following context:

Parameter Value
FSx for ONTAP deployment type Single-AZ
Provisioned throughput 128 MB/s
Region ap-northeast-1
Dataset shape 10K rows (250 KB) and 5M rows (103 MB), single Parquet file
Run type Warm (unless noted as cold start)
Network path Internet-origin AP (no VPC attachment for managed services)

Future benchmark runs will also capture: prefix depth, file count per prefix, average object size, p50/p90/p95/p99 latency where available, and cold/warm/repeated run count.

FSx S3 AP latency is in the tens of milliseconds range, and throughput depends on the file system's provisioned throughput capacity (AWS documentation). These benchmarks are sizing references from one test environment, not service limits or guarantees.


Architecture Decision Guide

Q: Do you need to WRITE transactional tables (Delta/Iceberg)?
  → Yes: Use native S3 for write path; FSx S3 AP for read-only source data
  → No: FSx S3 AP can handle the read-oriented and flat-file write patterns validated in this series

Q: Do you need sub-millisecond latency or unlimited concurrency?
  → Yes: Use native S3
  → No: FSx S3 AP (tens of ms, provisioned throughput)

Q: Do you have existing NAS data you want to analyze?
  → Yes: FSx S3 AP eliminates the copy pipeline
  → No: Native S3 may be simpler

Q: Do you need NFS/SMB access alongside S3 analytics?
  → Yes: FSx S3 AP (multi-protocol on same data)
  → No: Evaluate based on above
Enter fullscreen mode Exit fullscreen mode

Decision Criteria

Scale when:

  • Business metric improves (freshness, cost, time-to-insight)
  • Governance path is approved
  • Performance impact is within threshold

Adjust when:

  • Engine works but governance or performance needs redesign
  • Staging to native S3 is required for write path

Stop when:

  • Transactional table write semantics are mandatory on the same path
  • Vendor session policy blocks production path with no approved workaround
  • Security owner rejects the access model

Business Value Hypotheses

Business issue Baseline metric Expected value Validation path Decision owner
NAS analytics requires nightly copy to S3 Copy pipeline runtime, freshness lag Reduce data freshness lag to near-zero Athena / Glue / EMR direct query Data platform owner
Enterprise documents are hard to search Avg search time per user Faster document discovery Bedrock KB / permission-aware RAG Information management owner
ETL pipeline duplicates storage Duplicate storage cost Lower copy and storage overhead Glue / EMR write-back to same volume Storage / FinOps owner
Platform selection is unclear Weeks spent on PoC Faster architecture decision This compatibility map Architecture lead

Partner Offer Paths

Customer need Suggested offer Exit decision
Query NAS data without copy Athena / Redshift Spectrum validation pilot Scale / adjust / stop
ETL from NAS to curated Parquet Glue or EMR Serverless validation sprint Production design / stage to S3
RAG over enterprise documents Bedrock KB / permission-aware RAG assessment Proceed only with authorization model validated
Databricks lakehouse integration UC External Location with access_point field for read; staging to native S3 for Delta write File-level read works under UC; subdirectory listing and table creation pending vendor resolution
Transactional table write Native S3 table storage design FSx S3 AP as source, not table log storage

The purpose of these offers is not to force every workload onto FSx S3 AP, but to quickly identify the right access path, the right engine, and the right stop condition.


Key Technical Findings

1. Internet-Origin AP Required for Managed Services

In this validation, managed service paths (Athena, Glue, Redshift Spectrum, Bedrock) required internet-origin access points because the service access path did not originate from the customer VPC. Validate this per service, region, and network configuration.

2. Parquet Timestamp Compatibility

pandas and DuckDB generate Parquet with nanosecond timestamps by default. Spark (Glue, EMR) cannot read these files. Always use microsecond resolution for cross-engine compatibility.

3. EMRFS vs S3A

EMR's EMRFS (s3://) natively supports S3 AP aliases. The S3A FileSystem (s3a://) does NOT work with AP aliases (URL parsing error). Use s3:// prefix in EMR.

4. DuckDB httpfs Configuration

DuckDB requires s3_url_style = 'path' and explicit s3_endpoint to work with S3 AP aliases. In Lambda, also set home_directory = '/tmp'.

5. Trino Hive Connector

Trino requires hive.s3.path-style-access=true and explicit hive.s3.endpoint to resolve S3 AP aliases. Same pattern as DuckDB — path-style access is the key.

6. S3 Gateway Endpoint Routing

VPC-attached compute (Lambda in VPC, EC2) may experience timeouts when accessing FSx S3 AP through an S3 Gateway VPC Endpoint. The FSx S3 AP alias resolves to s3-r-w.<region>.amazonaws.com which may not route correctly through the Gateway endpoint. Workaround: use NAT Gateway or place compute outside VPC. See FSx S3 AP Networking Considerations.

7. Session Policy Is the Common Blocker for Third-Party Platforms

The session policy issue is not unique to one vendor in this validation. It may affect any analytics platform that applies restrictive AssumeRole session policies designed around standard S3 bucket ARN patterns. AWS-native services work because they use IAM roles directly without intermediary session policies.


Note for Regulated Workloads

"Zero data movement" means source files do not need to be copied from FSx for ONTAP to S3 for the tested access paths. However, metadata, query results, logs, embeddings, temporary files, and derived datasets may still be created by the consuming service.

For regulated workloads, validate:

  • Data classification of source and derived data
  • Derived data location (query results, embeddings, temp files)
  • Encryption and key ownership at each layer
  • Audit log coverage (CloudTrail, platform logs, ONTAP audit)
  • Retention and deletion policy
  • Approval owner and expiration date

Bedrock KB is a strong candidate for RAG over NAS documents, but regulated use cases must validate permission-aware retrieval, data classification, human review requirements, and residual risk acceptance before production use.

For regulated workloads, do not start a PoC until the data owner, security owner, and platform owner agree on the allowed prefixes, derived data locations, logging scope, rollback plan, and approval expiration date.

Assurance artifacts to prepare:

  • Non-technical overview for stakeholders
  • Data flow diagram (source → AP → service → output)
  • Access control summary (dual-layer authorization)
  • Audit evidence summary
  • Rollback plan
  • Residual risk register

Store these artifacts with an approval ID, owner, review date, and expiration date so the PoC decision can be audited later.


GenAI / RAG Evaluation Metrics

For GenAI and RAG workloads on FSx for ONTAP data, measure:

  • Retrieval accuracy (relevant documents returned)
  • Permission-aware retrieval pass rate (unauthorized documents NOT returned)
  • Hallucination reduction vs baseline
  • Data freshness lag (NFS write → S3 AP availability)
  • Human review workload
  • User time saved vs previous search method

Start with read-only, permission-aware, human-review-attached PoC before production deployment.


Series Index

This is the series overview for "FSx for ONTAP S3 Access Points × Lakehouse Deep Dive."

Part Platform Status URL
Part 1 Athena — Query NAS Data In Place ✅ Published dev.to
Part 2 Databricks — A Layer-by-Layer Validation of Observed Boundaries ✅ Published
Part 3 Snowflake — From 'Access Denied' to Working External Tables ✅ Resolved
Part 4 DuckDB Lambda — Serverless for $0.00001/query Ready to publish
Part 5 EMR Spark — Read-Write ETL Pipeline Ready to publish
Part 6 Redshift Spectrum — DWH Meets NAS Data Coming soon
Part 7 Trino — Open-Source SQL on NAS Data Coming soon
Summary This article (Overview — What Works and What Doesn't) Ready to publish

Note: This overview article can be published as the final "summary" post in the series, or as a standalone reference.

Update to Part 1 (Athena)

Since Part 1 was published, additional verification has been completed and published as a v1.1 update:

  • CTAS write-back: Verified as WORKING (3.7s, writes Parquet back to FSxN S3 AP)
  • Partition projection: Verified with Hive-style partitioning
  • Benchmark: 54.8 MB/s peak throughput (5M rows, 103 MB scan in 2.2s)
  • 9/9 negative tests pass: Unauthorized access correctly denied

Try It Yourself

git clone https://github.com/Yoshiki0705/fsxn-lakehouse-integrations.git
cd fsxn-lakehouse-integrations

# Deploy base infrastructure
aws cloudformation deploy \
  --template-file shared/cloudformation/fsxn-s3ap-base.yaml \
  --stack-name fsxn-lakehouse-base \
  --capabilities CAPABILITY_IAM

# Validate connectivity
python shared/scripts/validate-access.py --access-point-alias <your-ap-alias>

# Choose your platform: integrations/athena/, integrations/duckdb/, etc.
Enter fullscreen mode Exit fullscreen mode

Each integration directory includes a README, CloudFormation template, deployment script, and sample queries.


What's Next

  • Databricks UC + access_point field — partial success confirmed (2026-05-24); awaiting vendor guidance on subdirectory listing and table creation
  • Snowflake AWS_ACCESS_POINT_ARNresolved (2026-05-24); SELECT and External Table work with stage parameter
  • Apache Iceberg community engagement (S3FileIO + AP alias support)
  • ONTAP feature quantification (dedup ratio, snapshot RTO) — resolved (DNS/AD orphan config removed, S3 AP recovered 2026-05-24)
  • Redshift Spectrum and Trino deep-dive articles
  • Customer PoC execution with measured business outcomes

Operational Lessons Learned

S3 AP Timeout Caused by Orphaned DNS/AD Configuration (2026-05-24)

During this series validation, all S3 APs on one SVM became unresponsive for 7+ days. Root cause: the SVM had DNS servers configured for an AD domain that no longer existed. When the S3 AP backend processes requests on an AD-joined SVM, ONTAP's name-service stack attempts DNS resolution for user-mapping — if DNS is unreachable, requests block until timeout.

Key findings:

  • Disabling customer-configured FPolicy did NOT fix the issue
  • A separate SVM without DNS/AD worked normally on the same file system
  • Removing the orphaned CIFS/DNS configuration restored S3 AP instantly

Prevention: Do not leave orphaned DNS/AD configurations on SVMs used for S3 AP access. If AD is decommissioned, clean up vserver cifs and vserver services dns settings. See FSx S3 AP Networking — Section 7 for full details.


References


This series is based on hands-on verification, not documentation review. Every "Verified" claim has a corresponding evidence record in the verification-pack/ directory.

Disclaimer: This article is an independent validation report and does not represent AWS, NetApp, Databricks, or Snowflake official guidance. Product behavior, support status, and platform capabilities may change. Always validate in your own environment and consult vendor documentation and support channels.

Top comments (0)