Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on May 26 • Edited on Jun 22

Why Delta, Iceberg, and Hudi Can't Write to FSx S3 Access Points — And What Works Instead

#aws #deltalake #iceberg #amazonfsxfornetappontap

TL;DR

In Parts 1–6 of this series, we validated read paths across six engines. This Part 7 answers the write question: Can you use Delta Lake, Apache Iceberg, or Apache Hudi on FSx for ONTAP S3 Access Points?

No. All three transactional table formats fail to write on FSx S3 AP due to fundamental S3 API limitations:

Format	Failure Point	Error	Root Cause
Delta Lake	Commit (write)	`501 Not Implemented`	No conditional writes (If-None-Match)
Apache Hudi	Timeline commit	Not Supported (by design)	No atomic rename (.inflight → .commit)
Apache Iceberg	Metadata write	`NullPointerException`	S3FileIO cannot handle AP alias for metadata

What DOES work for write on FSx S3 AP:

✅ Flat Parquet append (PutObject)
✅ Athena CTAS (write-back)
✅ DuckDB COPY TO (write-back)
✅ EMR Spark df.write.parquet() (flat Parquet)

Quick Decision Guide:

Need transactional table format → Write to native S3, read FSxN data via S3 AP separately
Need write-back to FSxN → Use flat Parquet (append-only, no transactions)
Need ACID on NAS data → Use NFS/SMB protocol directly (not S3 AP)

GitHub: fsxn-lakehouse-integrations

How to Read This Article

This article is:

A consolidated "Not Supported" evidence report for transactional writes
Root cause analysis for each table format
Architecture guidance for working alternatives

Read by role:

Data engineer: Failure Evidence → What Works Instead
Architect: Root Cause Analysis → Architecture Patterns
Partner / SA: Partner Decision Card → Discovery Questions
Storage engineer: S3 API Compatibility → Why This Is Fundamental

Prerequisite Concepts

Before reading this article, it helps to understand:

Delta Lake — Databricks' open table format using _delta_log/ JSON commits with conditional writes
Apache Iceberg — Netflix's table format using metadata files with atomic commit protocol
Apache Hudi — Uber's table format using .hoodie/ timeline with atomic rename
Atomic rename — renaming a file in one atomic operation (S3 does not support this)
Conditional writes — writing only if a condition is met (e.g., If-None-Match); FSx S3 AP returns 501
PutObject — S3's basic write operation (supported by FSx S3 AP for files ≤ 5 GB)

Why Transactional Table Formats Need Special S3 Operations

All three formats solve the same problem: concurrent write safety on object storage. They each use a commit protocol that requires operations beyond basic PutObject:

Delta Lake:    PutObject with If-None-Match header (conditional write)
               → Prevents two writers from creating the same commit file

Apache Hudi:   Rename .inflight → .commit (atomic rename)
               → Marks a commit as complete atomically

Apache Iceberg: PutObject + HeadObject/GetObject for metadata verification
                → Verifies metadata was written correctly before commit

FSx for ONTAP S3 AP does NOT support:

Conditional writes (If-None-Match → returns 501 Not Implemented)
Atomic rename (S3 API has no rename operation)
Reliable metadata verification on AP alias (NullPointerException in S3FileIO)

This is not a configuration issue — it's a fundamental API limitation.

Architecture: What's Supported vs What's Not

                    FSx for ONTAP S3 Access Point
                              │
              ┌───────────────┼───────────────┐
              │               │               │
         ✅ READ          ✅ WRITE         ❌ WRITE
         (all engines)    (flat Parquet)   (transactional)
              │               │               │
    ┌─────────┤         ┌─────┤         ┌─────┤
    │         │         │     │         │     │
  Athena   Redshift   EMR  DuckDB   Delta  Iceberg  Hudi
  Snowflake Spectrum  Spark Lambda   Lake
  DuckDB
  EMR

Failure Evidence: Delta Lake

Test: delta-rs (Rust) write to FSx S3 AP
Date: 2026-05-23
Result: 501 Not Implemented

Error: Generic S3 error: Error performing put request to
s3://verification-tes-...-ext-s3alias/_delta_log/00000000000000000000.json:
response error "501 Not Implemented"

Root cause: Delta Lake's commit protocol uses If-None-Match header on PutObject to ensure only one writer creates each commit file. FSx for ONTAP S3 AP does not implement conditional writes and returns 501.

CloudShell reproduction: delta-rs write to FSx S3 AP returns "501 Not Implemented" — conditional writes (If-None-Match) are not supported.

Spark Delta fallback: Spark's Delta writer uses CopyObject + DeleteObject as a rename fallback, but this is not atomic — two concurrent writers can corrupt the log.

Conclusion: Delta Lake write is Not Supported on FSx S3 AP. This is a fundamental limitation, not a configuration issue.

Failure Evidence: Apache Hudi

Test: Logical deduction from Delta Lake verification + Hudi architecture analysis
Date: 2026-05-24
Result: Not Supported (by design)

Root cause: Apache Hudi's commit protocol requires atomic rename for its timeline:

.hoodie/[instant].inflight → .hoodie/[instant].commit

This rename must be atomic to prevent partial commits from being visible. S3 has no rename operation — the only way to "rename" is CopyObject + DeleteObject, which is not atomic.

Attempted verification: EMR Serverless with Hudi write — Hudi catalog plugin not available in EMR 7.1.0 default configuration. However, the fundamental constraint (no atomic rename) makes the outcome deterministic.

Conclusion: Apache Hudi write is Not Supported on FSx S3 AP. Same root cause as Delta Lake.

Failure Evidence: Apache Iceberg

Test: EMR Serverless (emr-7.1.0) with Iceberg S3FileIO + Glue Catalog
Date: 2026-05-24
Result: NullPointerException

java.lang.NullPointerException: Cannot invoke
"org.apache.iceberg.TableMetadata.metadataFileLocation()"
because "metadata" is null

Root cause: Iceberg's S3FileIO attempts to write metadata files to the warehouse path on FSx S3 AP. The NullPointerException occurs during the commit phase when Iceberg tries to verify the metadata file was written successfully. Possible causes:

S3FileIO may not correctly handle S3 AP alias as bucket name
The metadata write (PutObject) may succeed but subsequent HeadObject/GetObject to verify fails due to AP alias resolution
Iceberg's commit protocol may require operations not fully supported by FSx S3 AP

Note: Iceberg READ of pre-existing tables (metadata in Glue, data files on S3 AP) may still work — this was not tested in this run.

Conclusion: Apache Iceberg write is Not Supported on FSx S3 AP. The failure is in metadata write/verify, not data file write.

Summary: Why All Three Fail

Requirement	Delta Lake	Apache Hudi	Apache Iceberg	FSx S3 AP Support
Basic PutObject	✅ Uses	✅ Uses	✅ Uses	✅ Supported
Conditional write (If-None-Match)	✅ Required	❌ Not used	❌ Not used	❌ 501 Not Implemented
Atomic rename	❌ Not used	✅ Required	❌ Not used	❌ No S3 rename API
Metadata write + verify on AP alias	❌ Not needed	❌ Not needed	✅ Required	❌ NullPointerException
Write result	❌ Failed	❌ Not Supported	❌ Failed	—

Common thread: Each format requires at least one operation beyond basic PutObject that FSx S3 AP does not support.

AWS Support Confirmation (May 2026): The lack of conditional writes on FSx for ONTAP S3 Access Points has been confirmed as a product-level limitation by AWS Support. A feature request has been submitted for parity with S3 native conditional writes (available since August 2024). If implemented, Delta Lake and Iceberg transactional writes would become possible on FSx S3 AP. No implementation timeline has been committed. This is not a bug — it's a design boundary of the S3 AP interface.

What Works Instead

✅ Flat Parquet Append (PutObject)

All engines can write flat Parquet files to FSx S3 AP:

# EMR Spark
df.write.mode("append").parquet("s3://<AP>/gold/output/")

# DuckDB
COPY (SELECT * FROM result) TO 's3://<AP>/gold/output.parquet' (FORMAT PARQUET);

Limitations: No ACID transactions, no schema evolution, no time travel. Append-only pattern.

✅ Athena CTAS (Write-back)

CREATE TABLE fsxn_gold.aggregated_sensors
WITH (
  external_location = 's3://<AP>/gold/athena-output/',
  format = 'PARQUET'
) AS
SELECT status, COUNT(*), AVG(temperature)
FROM fsxn_athena_verification.sensor_readings
GROUP BY status;

✅ DuckDB COPY TO

conn.execute("""
    COPY (SELECT * FROM read_parquet('s3://<AP>/sensor-data/*.parquet')
          WHERE temperature > 30)
    TO 's3://<AP>/gold/hot_sensors.parquet' (FORMAT PARQUET)
""")

✅ EMR Spark Write (Flat Parquet)

agg_df.write.mode("overwrite").parquet("s3://<AP>/gold/emr_output/")

For regulated workloads (Takizawa-san lens): Flat Parquet on FSx for ONTAP S3 AP does NOT provide ACID guarantees, schema evolution, or time travel. If your compliance framework requires transactional consistency (e.g., SOX audit trail, HIPAA data integrity), flat Parquet is insufficient. Use DataSync → S3 → Delta/Iceberg with Lake Formation governance (Part 6) for regulated workloads that need both FSx for ONTAP as source and ACID guarantees on the analytics layer.

Architecture Patterns for Transactional Workloads

If you need transactional table formats AND FSx for ONTAP data:

Sync mechanism note (verified May 2026): SnapMirror S3 (ONTAP S3 bucket → AWS S3 replication) is not available on FSx for ONTAP — the snapmirror object-store commands are disabled as a managed service restriction. AWS DataSync (NFS → S3) is the only validated sync mechanism for moving FSx for ONTAP data to standard S3 buckets where Delta/Iceberg/Hudi can write safely.

Pattern 1: Read from FSx for ONTAP, Write to Native S3

FSx for ONTAP (source) ──S3 AP──▶ EMR Spark (read + transform)
                                        │
                                        ▼
                              Native S3 (Delta/Iceberg table)
                                        │
                                        ▼
                              Athena / Databricks / Redshift (query)

Use when: You need Delta/Iceberg for downstream analytics but source data lives on FSxN.

Pattern 2: Write via NFS, Read via S3 AP

Application ──NFS/SMB──▶ FSx for ONTAP Volume (write files)
                                        │
                              S3 Access Point (read-only)
                                        │
                                        ▼
                              Athena / Redshift / DuckDB (query)

Use when: Applications write via NFS/SMB and analytics engines read via S3 AP. No transactional format needed because NFS provides POSIX semantics.

Pattern 3: Hybrid (FSxN for raw, S3 for curated)

FSx for ONTAP (raw/bronze) ──S3 AP──▶ EMR Spark ──▶ S3 (silver/gold, Iceberg)
         │                                                    │
         └── NFS/SMB access for apps                          └── Athena + Lake Formation

Use when: Raw data stays on FSxN (multi-protocol access), curated data goes to S3 with full lakehouse capabilities.

Pattern 4: Snowflake Managed Iceberg (Confirmed May 2026)

FSx for ONTAP (raw) ──S3 AP──▶ Snowflake External Stage (AWS_ACCESS_POINT_ARN)
                                        │
                                   COPY INTO
                                        │
                                        ▼
                          Snowflake Managed Iceberg Table
                          (open Iceberg format on customer S3)
                                        │
                          ┌─────────────┼─────────────┐
                          │             │             │
                    Databricks UC  AWS Athena    EMR Spark
                    (read Iceberg) (read Iceberg) (read Iceberg)

Use when: You need Iceberg table format AND FSx for ONTAP as source AND multi-engine access. Snowflake manages the Iceberg lifecycle (OPTIMIZE, Time Travel, governance) while writing in open format to customer-owned S3. External engines read the same Iceberg metadata directly.

Support confirmed (May 2026): COPY INTO from FSx for ONTAP S3 AP External Stage → Snowflake Managed Iceberg Table is supported. Dynamic Tables with External Table source also work (REFRESH_MODE = FULL, min TARGET_LAG 60s). This provides a governed, AI-ready Iceberg path without writing Iceberg directly to FSx S3 AP.

Comparison with Other Engines in This Series

Engine	Read from FSxN S3 AP	Write flat Parquet	Write Delta/Iceberg/Hudi
Athena (Part 1)	✅	✅ CTAS	❌
Databricks (Part 2)	⚠️ Partial	❌ (UC blocks)	❌
Snowflake (Part 3)	✅	⚠️ TBD	❌
DuckDB Lambda (Part 4)	✅	✅ COPY TO	❌
EMR Spark (Part 5)	✅	✅ df.write	❌
Redshift Spectrum (Part 6)	✅	❌ (read-only)	❌

Key insight: ALL engines can read from FSx S3 AP. MOST can write flat Parquet. NONE can write transactional table formats. This is a property of the S3 AP interface, not the engines.

Partner Decision Card

Customer requirement	FSx S3 AP path	Recommended alternative
Read NAS data from analytics engines	✅ Works (all engines)	Use any engine from Parts 1-6
Write flat Parquet back to NAS	✅ Works (EMR, DuckDB, Athena CTAS)	Use EMR Spark or DuckDB
Delta Lake on NAS data	❌ Not Supported	Write Delta to native S3; read FSxN separately
Iceberg on NAS data	❌ Not Supported	Write Iceberg to native S3; read FSxN separately
Hudi on NAS data	❌ Not Supported	Write Hudi to native S3; read FSxN separately
ACID transactions on NAS	❌ Not via S3 AP	Use NFS/SMB protocol directly
Schema evolution on NAS data	❌ Not via S3 AP	Use Glue Catalog for schema management

Discovery Questions for Partners

When a customer asks about transactional table formats on FSx for ONTAP S3 AP:

Is the requirement for transactional WRITE or just READ? (Read of pre-existing tables may work for Iceberg)
Can the transactional table live on native S3 while source data stays on FSxN? (Pattern 1)
Is the write pattern append-only or does it require updates/deletes? (Append-only works with flat Parquet)
Does the application already write via NFS/SMB? (Pattern 2 — no S3 AP write needed)
Is schema evolution required? (Use Glue Catalog for schema management without table format)
What is the concurrency requirement? (Single-writer flat Parquet is safe; multi-writer needs transactions)

Governance Impact

Write pattern	Governance model	Concurrency safety	Production suitability
Flat Parquet (single writer)	IAM + S3 AP + Glue Catalog	✅ Safe (single writer)	Production-ready
Flat Parquet (multi-writer)	IAM + S3 AP + Glue Catalog	⚠️ Risk (no conflict detection)	Use with caution
Delta/Iceberg/Hudi	N/A	N/A	❌ Not Supported
NFS/SMB write + S3 AP read	POSIX + IAM + S3 AP	✅ Safe (POSIX locking)	Production-ready

For multi-writer scenarios: If multiple processes need to write to the same prefix on FSxN via S3 AP, use a coordination mechanism (e.g., Step Functions, SQS queue) to serialize writes. Without transactional table formats, there is no built-in conflict detection.

AI Readiness Score

Pattern	Governance	Performance	AI Capability	Cost	Operational Simplicity	Overall
Flat Parquet + Glue Catalog	★★★☆☆	★★★★☆	★★☆☆☆	★★★★★	★★★★★	3.8
NFS write + S3 AP read	★★★☆☆	★★★★☆	★★☆☆☆	★★★★★	★★★★☆	3.6
Hybrid (FSxN raw + S3 Iceberg)	★★★★★	★★★★★	★★★★☆	★★★☆☆	★★☆☆☆	3.8

Scoring methodology: Flat Parquet + Glue Catalog scores highest on Cost and Simplicity (no additional infrastructure). Hybrid pattern scores highest on Governance and Performance (full lakehouse on S3) but lower on Simplicity (two storage tiers to manage).

Cost Analysis

Pattern	Additional cost beyond FSxN	Notes
Flat Parquet (append-only)	$0	Just PutObject to existing FSxN
Hybrid (FSxN + S3 Iceberg)	S3 storage + Glue Catalog	Duplicate storage for curated layer
NFS write + S3 AP read	$0	Same FSxN volume, two access paths

Key insight: The cheapest write pattern is flat Parquet directly to FSxN via S3 AP. If you need transactional capabilities, the cost is maintaining a separate S3 tier for the curated layer.

Known Failure Signatures

Symptom	Format	Root cause	Resolution
`501 Not Implemented`	Delta Lake	Conditional write (If-None-Match) not supported	Use flat Parquet instead
`NullPointerException` on metadata	Iceberg	S3FileIO cannot handle AP alias	Write Iceberg to native S3
Rename fails / timeline corrupt	Hudi	No atomic rename in S3 API	Write Hudi to native S3
`CopyObject` + `DeleteObject` (non-atomic)	Delta (Spark fallback)	Spark uses copy+delete as rename	Not safe for concurrent writes
Write succeeds but table is corrupt	Any format (if forced)	Missing concurrency control	Do not force transactional writes

What's Next

This article concludes the core engine validation series (Parts 1-7). The series has validated:

✅ 6 read engines (Athena, Databricks, Snowflake, DuckDB, EMR, Redshift)
✅ 3 write patterns (flat Parquet via EMR, DuckDB, Athena CTAS)
❌ 3 table formats that don't work (Delta, Iceberg, Hudi)
✅ Enterprise governance (Lake Formation fine-grained: column, row, tag)
✅ AI/ML integration (Snowflake Cortex: 8/10 functions, Bedrock KB: zero-copy RAG)

Start Here: 3 Steps to Validate in Your Environment

Choose your engine using the comparison tables in this series:
- Cheapest: DuckDB Lambda (Part 4) — $0.00001/query
- Most governed: Redshift Spectrum + Lake Formation (Part 6)
- Best AI: Snowflake External Table + Cortex (Part 3)
- Best ETL: EMR Serverless (Part 5)
- Best for Databricks customers: DataSync → S3 → UC (Part 2)
Deploy the verification template from GitHub — each engine has a CloudFormation template and setup guide
Record evidence using the verification-pack templates — consistent, reviewable results across environments

PoC Cost Summary (1-day validation)

Engine	PoC Cost (1 day)	What you validate
DuckDB Lambda	~$0.01	Read + write Parquet, sub-second queries
Athena	~$0.05	Serverless SQL, Glue catalog integration
EMR Serverless	~$0.50	Spark ETL, write-back, distributed processing
Redshift Serverless	~$1.50	DWH JOINs, Lake Formation governance
Snowflake	~$5 (1 credit)	External Table, Cortex AI, governance tags

References

Key achievement: This validation conclusively established the write boundary for FSx for ONTAP S3 Access Points — transactional table formats (Delta, Iceberg, Hudi) are not supported due to fundamental S3 API limitations (no conditional writes, no atomic rename). The working alternative is flat Parquet append via PutObject, which is supported by EMR Spark, DuckDB, and Athena CTAS. For teams that need transactional capabilities, the recommended pattern is hybrid: raw data on FSxN (multi-protocol access) with curated Iceberg/Delta tables on native S3.

Evidence from verification-pack: delta-lake/, iceberg/, hudi/

Disclaimer: This article is an independent validation report and does not represent AWS, NetApp, Databricks, or Apache Software Foundation official guidance. Product behavior and platform capabilities may change. Always validate in your own environment.

DEV Community