DEV Community

Cover image for Why Delta, Iceberg, and Hudi Can't Write to FSx S3 Access Points — And What Works Instead

Why Delta, Iceberg, and Hudi Can't Write to FSx S3 Access Points — And What Works Instead

TL;DR

In Parts 16 of this series, we validated read paths across six engines. This Part 7 answers the write question: Can you use Delta Lake, Apache Iceberg, or Apache Hudi on FSx for ONTAP S3 Access Points?

No. All three transactional table formats fail to write on FSx S3 AP due to fundamental S3 API limitations:

Format Failure Point Error Root Cause
Delta Lake Commit (write) 501 Not Implemented No conditional writes (If-None-Match)
Apache Hudi Timeline commit Not Supported (by design) No atomic rename (.inflight → .commit)
Apache Iceberg Metadata write NullPointerException S3FileIO cannot handle AP alias for metadata

What DOES work for write on FSx S3 AP:

  • ✅ Flat Parquet append (PutObject)
  • ✅ Athena CTAS (write-back)
  • ✅ DuckDB COPY TO (write-back)
  • ✅ EMR Spark df.write.parquet() (flat Parquet)

Quick Decision Guide:

  • Need transactional table format → Write to native S3, read FSxN data via S3 AP separately
  • Need write-back to FSxN → Use flat Parquet (append-only, no transactions)
  • Need ACID on NAS data → Use NFS/SMB protocol directly (not S3 AP)

GitHub: fsxn-lakehouse-integrations


How to Read This Article

This article is:

  • A consolidated "Not Supported" evidence report for transactional writes
  • Root cause analysis for each table format
  • Architecture guidance for working alternatives

Read by role:

  • Data engineer: Failure Evidence → What Works Instead
  • Architect: Root Cause Analysis → Architecture Patterns
  • Partner / SA: Partner Decision Card → Discovery Questions
  • Storage engineer: S3 API Compatibility → Why This Is Fundamental

Prerequisite Concepts

Before reading this article, it helps to understand:

  • Delta Lake — Databricks' open table format using _delta_log/ JSON commits with conditional writes
  • Apache Iceberg — Netflix's table format using metadata files with atomic commit protocol
  • Apache Hudi — Uber's table format using .hoodie/ timeline with atomic rename
  • Atomic rename — renaming a file in one atomic operation (S3 does not support this)
  • Conditional writes — writing only if a condition is met (e.g., If-None-Match); FSx S3 AP returns 501
  • PutObject — S3's basic write operation (supported by FSx S3 AP for files ≤ 5 GB)

Why Transactional Table Formats Need Special S3 Operations

All three formats solve the same problem: concurrent write safety on object storage. They each use a commit protocol that requires operations beyond basic PutObject:

Delta Lake:    PutObject with If-None-Match header (conditional write)
               → Prevents two writers from creating the same commit file

Apache Hudi:   Rename .inflight → .commit (atomic rename)
               → Marks a commit as complete atomically

Apache Iceberg: PutObject + HeadObject/GetObject for metadata verification
                → Verifies metadata was written correctly before commit
Enter fullscreen mode Exit fullscreen mode

FSx for ONTAP S3 AP does NOT support:

  1. Conditional writes (If-None-Match → returns 501 Not Implemented)
  2. Atomic rename (S3 API has no rename operation)
  3. Reliable metadata verification on AP alias (NullPointerException in S3FileIO)

This is not a configuration issue — it's a fundamental API limitation.


Architecture: What's Supported vs What's Not

                    FSx for ONTAP S3 Access Point
                              │
              ┌───────────────┼───────────────┐
              │               │               │
         ✅ READ          ✅ WRITE         ❌ WRITE
         (all engines)    (flat Parquet)   (transactional)
              │               │               │
    ┌─────────┤         ┌─────┤         ┌─────┤
    │         │         │     │         │     │
  Athena   Redshift   EMR  DuckDB   Delta  Iceberg  Hudi
  Snowflake Spectrum  Spark Lambda   Lake
  DuckDB
  EMR
Enter fullscreen mode Exit fullscreen mode

Failure Evidence: Delta Lake

Test: delta-rs (Rust) write to FSx S3 AP
Date: 2026-05-23
Result: 501 Not Implemented

Error: Generic S3 error: Error performing put request to
s3://verification-tes-...-ext-s3alias/_delta_log/00000000000000000000.json:
response error "501 Not Implemented"
Enter fullscreen mode Exit fullscreen mode

Root cause: Delta Lake's commit protocol uses If-None-Match header on PutObject to ensure only one writer creates each commit file. FSx for ONTAP S3 AP does not implement conditional writes and returns 501.

Delta Lake 501 Not Implemented error on CloudShell

CloudShell reproduction: delta-rs write to FSx S3 AP returns "501 Not Implemented" — conditional writes (If-None-Match) are not supported.

Spark Delta fallback: Spark's Delta writer uses CopyObject + DeleteObject as a rename fallback, but this is not atomic — two concurrent writers can corrupt the log.

Conclusion: Delta Lake write is Not Supported on FSx S3 AP. This is a fundamental limitation, not a configuration issue.


Failure Evidence: Apache Hudi

Test: Logical deduction from Delta Lake verification + Hudi architecture analysis
Date: 2026-05-24
Result: Not Supported (by design)

Root cause: Apache Hudi's commit protocol requires atomic rename for its timeline:

.hoodie/[instant].inflight → .hoodie/[instant].commit
Enter fullscreen mode Exit fullscreen mode

This rename must be atomic to prevent partial commits from being visible. S3 has no rename operation — the only way to "rename" is CopyObject + DeleteObject, which is not atomic.

Attempted verification: EMR Serverless with Hudi write — Hudi catalog plugin not available in EMR 7.1.0 default configuration. However, the fundamental constraint (no atomic rename) makes the outcome deterministic.

Conclusion: Apache Hudi write is Not Supported on FSx S3 AP. Same root cause as Delta Lake.


Failure Evidence: Apache Iceberg

Test: EMR Serverless (emr-7.1.0) with Iceberg S3FileIO + Glue Catalog
Date: 2026-05-24
Result: NullPointerException

java.lang.NullPointerException: Cannot invoke
"org.apache.iceberg.TableMetadata.metadataFileLocation()"
because "metadata" is null
Enter fullscreen mode Exit fullscreen mode

Root cause: Iceberg's S3FileIO attempts to write metadata files to the warehouse path on FSx S3 AP. The NullPointerException occurs during the commit phase when Iceberg tries to verify the metadata file was written successfully. Possible causes:

  1. S3FileIO may not correctly handle S3 AP alias as bucket name
  2. The metadata write (PutObject) may succeed but subsequent HeadObject/GetObject to verify fails due to AP alias resolution
  3. Iceberg's commit protocol may require operations not fully supported by FSx S3 AP

Note: Iceberg READ of pre-existing tables (metadata in Glue, data files on S3 AP) may still work — this was not tested in this run.

Conclusion: Apache Iceberg write is Not Supported on FSx S3 AP. The failure is in metadata write/verify, not data file write.


Summary: Why All Three Fail

Requirement Delta Lake Apache Hudi Apache Iceberg FSx S3 AP Support
Basic PutObject ✅ Uses ✅ Uses ✅ Uses ✅ Supported
Conditional write (If-None-Match) ✅ Required ❌ Not used ❌ Not used ❌ 501 Not Implemented
Atomic rename ❌ Not used ✅ Required ❌ Not used ❌ No S3 rename API
Metadata write + verify on AP alias ❌ Not needed ❌ Not needed ✅ Required ❌ NullPointerException
Write result ❌ Failed ❌ Not Supported ❌ Failed

Common thread: Each format requires at least one operation beyond basic PutObject that FSx S3 AP does not support. This is not a bug — it's a design boundary of the S3 AP interface.


What Works Instead

✅ Flat Parquet Append (PutObject)

All engines can write flat Parquet files to FSx S3 AP:

# EMR Spark
df.write.mode("append").parquet("s3://<AP>/gold/output/")

# DuckDB
COPY (SELECT * FROM result) TO 's3://<AP>/gold/output.parquet' (FORMAT PARQUET);
Enter fullscreen mode Exit fullscreen mode

Limitations: No ACID transactions, no schema evolution, no time travel. Append-only pattern.

✅ Athena CTAS (Write-back)

CREATE TABLE fsxn_gold.aggregated_sensors
WITH (
  external_location = 's3://<AP>/gold/athena-output/',
  format = 'PARQUET'
) AS
SELECT status, COUNT(*), AVG(temperature)
FROM fsxn_athena_verification.sensor_readings
GROUP BY status;
Enter fullscreen mode Exit fullscreen mode

✅ DuckDB COPY TO

conn.execute("""
    COPY (SELECT * FROM read_parquet('s3://<AP>/sensor-data/*.parquet')
          WHERE temperature > 30)
    TO 's3://<AP>/gold/hot_sensors.parquet' (FORMAT PARQUET)
""")
Enter fullscreen mode Exit fullscreen mode

✅ EMR Spark Write (Flat Parquet)

agg_df.write.mode("overwrite").parquet("s3://<AP>/gold/emr_output/")
Enter fullscreen mode Exit fullscreen mode

For regulated workloads (Takizawa-san lens): Flat Parquet on FSx for ONTAP S3 AP does NOT provide ACID guarantees, schema evolution, or time travel. If your compliance framework requires transactional consistency (e.g., SOX audit trail, HIPAA data integrity), flat Parquet is insufficient. Use DataSync → S3 → Delta/Iceberg with Lake Formation governance (Part 6) for regulated workloads that need both FSx for ONTAP as source and ACID guarantees on the analytics layer.


Architecture Patterns for Transactional Workloads

If you need transactional table formats AND FSx for ONTAP data:

Sync mechanism note (verified May 2026): SnapMirror S3 (ONTAP S3 bucket → AWS S3 replication) is not available on FSx for ONTAP — the snapmirror object-store commands are disabled as a managed service restriction. AWS DataSync (NFS → S3) is the only validated sync mechanism for moving FSx for ONTAP data to standard S3 buckets where Delta/Iceberg/Hudi can write safely.

Pattern 1: Read from FSx for ONTAP, Write to Native S3

FSx for ONTAP (source) ──S3 AP──▶ EMR Spark (read + transform)
                                        │
                                        ▼
                              Native S3 (Delta/Iceberg table)
                                        │
                                        ▼
                              Athena / Databricks / Redshift (query)
Enter fullscreen mode Exit fullscreen mode

Use when: You need Delta/Iceberg for downstream analytics but source data lives on FSxN.

Pattern 2: Write via NFS, Read via S3 AP

Application ──NFS/SMB──▶ FSx for ONTAP Volume (write files)
                                        │
                              S3 Access Point (read-only)
                                        │
                                        ▼
                              Athena / Redshift / DuckDB (query)
Enter fullscreen mode Exit fullscreen mode

Use when: Applications write via NFS/SMB and analytics engines read via S3 AP. No transactional format needed because NFS provides POSIX semantics.

Pattern 3: Hybrid (FSxN for raw, S3 for curated)

FSx for ONTAP (raw/bronze) ──S3 AP──▶ EMR Spark ──▶ S3 (silver/gold, Iceberg)
         │                                                    │
         └── NFS/SMB access for apps                          └── Athena + Lake Formation
Enter fullscreen mode Exit fullscreen mode

Use when: Raw data stays on FSxN (multi-protocol access), curated data goes to S3 with full lakehouse capabilities.


Comparison with Other Engines in This Series

Engine Read from FSxN S3 AP Write flat Parquet Write Delta/Iceberg/Hudi
Athena (Part 1) ✅ CTAS
Databricks (Part 2) ⚠️ Partial ❌ (UC blocks)
Snowflake (Part 3) ⚠️ TBD
DuckDB Lambda (Part 4) ✅ COPY TO
EMR Spark (Part 5) ✅ df.write
Redshift Spectrum (Part 6) ❌ (read-only)

Key insight: ALL engines can read from FSx S3 AP. MOST can write flat Parquet. NONE can write transactional table formats. This is a property of the S3 AP interface, not the engines.


Partner Decision Card

Customer requirement FSx S3 AP path Recommended alternative
Read NAS data from analytics engines ✅ Works (all engines) Use any engine from Parts 1-6
Write flat Parquet back to NAS ✅ Works (EMR, DuckDB, Athena CTAS) Use EMR Spark or DuckDB
Delta Lake on NAS data ❌ Not Supported Write Delta to native S3; read FSxN separately
Iceberg on NAS data ❌ Not Supported Write Iceberg to native S3; read FSxN separately
Hudi on NAS data ❌ Not Supported Write Hudi to native S3; read FSxN separately
ACID transactions on NAS ❌ Not via S3 AP Use NFS/SMB protocol directly
Schema evolution on NAS data ❌ Not via S3 AP Use Glue Catalog for schema management

Discovery Questions for Partners

When a customer asks about transactional table formats on FSx for ONTAP S3 AP:

  1. Is the requirement for transactional WRITE or just READ? (Read of pre-existing tables may work for Iceberg)
  2. Can the transactional table live on native S3 while source data stays on FSxN? (Pattern 1)
  3. Is the write pattern append-only or does it require updates/deletes? (Append-only works with flat Parquet)
  4. Does the application already write via NFS/SMB? (Pattern 2 — no S3 AP write needed)
  5. Is schema evolution required? (Use Glue Catalog for schema management without table format)
  6. What is the concurrency requirement? (Single-writer flat Parquet is safe; multi-writer needs transactions)

Governance Impact

Write pattern Governance model Concurrency safety Production suitability
Flat Parquet (single writer) IAM + S3 AP + Glue Catalog ✅ Safe (single writer) Production-ready
Flat Parquet (multi-writer) IAM + S3 AP + Glue Catalog ⚠️ Risk (no conflict detection) Use with caution
Delta/Iceberg/Hudi N/A N/A ❌ Not Supported
NFS/SMB write + S3 AP read POSIX + IAM + S3 AP ✅ Safe (POSIX locking) Production-ready

For multi-writer scenarios: If multiple processes need to write to the same prefix on FSxN via S3 AP, use a coordination mechanism (e.g., Step Functions, SQS queue) to serialize writes. Without transactional table formats, there is no built-in conflict detection.


AI Readiness Score

Pattern Governance Performance AI Capability Cost Operational Simplicity Overall
Flat Parquet + Glue Catalog ★★★☆☆ ★★★★☆ ★★☆☆☆ ★★★★★ ★★★★★ 3.8
NFS write + S3 AP read ★★★☆☆ ★★★★☆ ★★☆☆☆ ★★★★★ ★★★★☆ 3.6
Hybrid (FSxN raw + S3 Iceberg) ★★★★★ ★★★★★ ★★★★☆ ★★★☆☆ ★★☆☆☆ 3.8

Scoring methodology: Flat Parquet + Glue Catalog scores highest on Cost and Simplicity (no additional infrastructure). Hybrid pattern scores highest on Governance and Performance (full lakehouse on S3) but lower on Simplicity (two storage tiers to manage).


Cost Analysis

Pattern Additional cost beyond FSxN Notes
Flat Parquet (append-only) $0 Just PutObject to existing FSxN
Hybrid (FSxN + S3 Iceberg) S3 storage + Glue Catalog Duplicate storage for curated layer
NFS write + S3 AP read $0 Same FSxN volume, two access paths

Key insight: The cheapest write pattern is flat Parquet directly to FSxN via S3 AP. If you need transactional capabilities, the cost is maintaining a separate S3 tier for the curated layer.


Known Failure Signatures

Symptom Format Root cause Resolution
501 Not Implemented Delta Lake Conditional write (If-None-Match) not supported Use flat Parquet instead
NullPointerException on metadata Iceberg S3FileIO cannot handle AP alias Write Iceberg to native S3
Rename fails / timeline corrupt Hudi No atomic rename in S3 API Write Hudi to native S3
CopyObject + DeleteObject (non-atomic) Delta (Spark fallback) Spark uses copy+delete as rename Not safe for concurrent writes
Write succeeds but table is corrupt Any format (if forced) Missing concurrency control Do not force transactional writes

What's Next

This article concludes the core engine validation series (Parts 1-7). The series has validated:

  • ✅ 6 read engines (Athena, Databricks, Snowflake, DuckDB, EMR, Redshift)
  • ✅ 3 write patterns (flat Parquet via EMR, DuckDB, Athena CTAS)
  • ❌ 3 table formats that don't work (Delta, Iceberg, Hudi)
  • ✅ Enterprise governance (Lake Formation fine-grained: column, row, tag)
  • ✅ AI/ML integration (Snowflake Cortex: 8/10 functions, Bedrock KB: zero-copy RAG)

Start Here: 3 Steps to Validate in Your Environment

  1. Choose your engine using the comparison tables in this series:

    • Cheapest: DuckDB Lambda (Part 4) — $0.00001/query
    • Most governed: Redshift Spectrum + Lake Formation (Part 6)
    • Best AI: Snowflake External Table + Cortex (Part 3)
    • Best ETL: EMR Serverless (Part 5)
    • Best for Databricks customers: DataSync → S3 → UC (Part 2)
  2. Deploy the verification template from GitHub — each engine has a CloudFormation template and setup guide

  3. Record evidence using the verification-pack templates — consistent, reviewable results across environments

PoC Cost Summary (1-day validation)

Engine PoC Cost (1 day) What you validate
DuckDB Lambda ~$0.01 Read + write Parquet, sub-second queries
Athena ~$0.05 Serverless SQL, Glue catalog integration
EMR Serverless ~$0.50 Spark ETL, write-back, distributed processing
Redshift Serverless ~$1.50 DWH JOINs, Lake Formation governance
Snowflake ~$5 (1 credit) External Table, Cortex AI, governance tags

Previously in this series:


References


Key achievement: This validation conclusively established the write boundary for FSx for ONTAP S3 Access Points — transactional table formats (Delta, Iceberg, Hudi) are not supported due to fundamental S3 API limitations (no conditional writes, no atomic rename). The working alternative is flat Parquet append via PutObject, which is supported by EMR Spark, DuckDB, and Athena CTAS. For teams that need transactional capabilities, the recommended pattern is hybrid: raw data on FSxN (multi-protocol access) with curated Iceberg/Delta tables on native S3.

Evidence from verification-pack: delta-lake/, iceberg/, hudi/

Disclaimer: This article is an independent validation report and does not represent AWS, NetApp, Databricks, or Apache Software Foundation official guidance. Product behavior and platform capabilities may change. Always validate in your own environment.

Top comments (0)