Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on May 24 • Edited on May 26

Databricks and FSx for ONTAP S3 Access Points — A Layer-by-Layer Validation of Observed Boundaries

#aws #databricks #amazonfsxfornetappontap #lakehouse

TL;DR

Connecting Databricks to FSx for ONTAP S3 Access Points is significantly harder than Athena (Part 1). After testing every approach I could find — Unity Catalog External Locations, NFS mounts, Instance Profiles, multiple VPC configurations — here is what I found:

Unity Catalog's session policy initially blocked the FSx for ONTAP S3 AP ARN pattern → 403
Setting the access_point field on the External Location partially resolves the session policy: explicit-path file read succeeds, but UC table creation, subdirectory listing, and write operations remain blocked — meaning UC governance features (lineage, tags, fine-grained access) cannot yet be applied
NFS kernel mount is blocked by seccomp by design (confirmed by Databricks Support)
Instance Profile + boto3 works for direct S3 AP access (bypassing Unity Catalog)
Spark read with explicit file path works under UC governance — 1000 rows of sensor data readable with full schema inference, proving data access is possible even if table creation is blocked

Quick Decision Guide:

Read-only SQL analytics on NAS data → Use Athena (Part 1) or Snowflake External Table (Part 3)
Governed Databricks lakehouse on NAS data → Stage via FPolicy → Lambda → S3 → Auto Loader → UC Managed Table
Exploratory PoC (time-limited) → Instance Profile + boto3 with compensating controls

This article is a layer-by-layer validation of observed integration boundaries between Databricks and FSx for ONTAP S3 Access Points. It is not an argument against Databricks. Databricks remains a strong platform for lakehouse, ML, and production Delta workloads. This article focuses narrowly on one integration boundary: direct access from Databricks to FSx for ONTAP S3 Access Points.

This article documents the full troubleshooting journey, including the strace analysis that identified the root cause of NFS mount failures.

This article documents observed behavior in one validated environment. It should not be interpreted as a general compatibility statement for all Databricks configurations or future platform versions.

GitHub Repository: fsxn-lakehouse-integrations

If you want to reproduce this validation, the repository's integrations/databricks/ directory contains environment setup notes, and verification-pack/ contains test templates and evidence recording formats. The verification pack is intentionally template-first by design, so validation runs can produce consistent, reviewable evidence across environments. Actual result files will be added as validation runs are completed.

This article also includes a Snowflake ↔ Databricks concept mapping table (showing which capabilities work on each platform) and an AI Readiness Score to help teams quantitatively compare pattern options for FSx for ONTAP integration.

How to Read This Article

This article is:

A reproduction-focused validation report
Evidence from one environment (DBR 17.3 LTS, ap-northeast-1)
A starting point for vendor confirmation and architecture discussion

This article is not:

A general compatibility statement
A production certification
A statement on behalf of Databricks

Read by role:

Databricks admin: Unity Catalog External Location → Governance Impact Summary
Storage engineer: NFS Mount investigation → Evidence Matrix
Data engineer: Instance Profile + boto3 → Next Validation Metrics
Partner / SA: Decision Matrix → Discovery Questions → Partner Conversation Guide
Opening a support case: Databricks Support Case Packet

Prerequisite Concepts

Before reading this article, it helps to understand:

Unity Catalog Storage Credential — an object that stores a reference to a cloud IAM role for accessing external storage
Unity Catalog External Location — maps a cloud storage path to a storage credential for governed access
Instance Profile on AWS — an IAM role attached to an EC2 instance, providing credentials via IMDS
Databricks-managed VPC vs Customer-managed VPC — whether Databricks or the customer controls the workspace network
Cluster access modes — Standard (shared, multi-user with UC governance) and Dedicated (single-user with sudo access). Unity Catalog requires standard or dedicated access mode
S3 Access Point ARN vs S3 bucket ARN — S3 AP uses arn:aws:s3:<region>:<account>:accesspoint/<name>, not arn:aws:s3:::<bucket>
Driver vs executor behavior in Spark — the driver orchestrates; executors run distributed tasks. Credentials and network paths may differ between them

Key Concepts: Databricks Storage & Ingestion Architecture

Understanding Databricks' storage and ingestion concepts is essential for evaluating FSx for ONTAP S3 AP integration boundaries.

Partner Quick Reference: If your customer asks "Can Databricks read our NAS data via S3 Access Points?" — the answer is "partially, with limitations." File-level read works under UC governance, but table creation and directory listing are blocked. For governed analytics on NAS data today, recommend Snowflake External Table or Athena. For Databricks-specific workloads, recommend staged ingestion to S3 → UC Managed Table (see Recommended Architecture Pattern). If the customer already uses Databricks, the FPolicy → Lambda → S3 → Auto Loader pattern preserves full UC governance on ingested data. See Discovery Questions for customer qualification.

Storage Credential → External Location → External Table/Volume

Storage Credential (IAM Role ARN + External ID)
    │
    └── External Location (cloud storage path + credential + access_point field)
            │
            ├── External Table (tabular data: Parquet, Delta, Iceberg)
            └── External Volume (non-tabular: images, documents, audio)

Concept	Description	FSx S3 AP Status
Storage Credential	IAM Role that Databricks assumes to access cloud storage. During AssumeRole, Databricks generates a session policy that restricts what the assumed session can do — even if the IAM role itself has broader permissions.	✅ Created
External Location	Maps S3 path to a Storage Credential; defines access boundary	✅ Created (with `access_point` field)
External Table	UC-governed table whose data resides in External Location	❌ CREATE TABLE blocked
External Volume	UC-governed volume for unstructured files in External Location	❌ Blocked (same session policy issue)

External Volume is the Databricks equivalent of Snowflake's Directory Table — it provides governed access to non-tabular files (images, documents, audio, video). Since External Volume requires External Location creation with full subdirectory access, it is currently blocked by the same session policy limitation that blocks External Table creation.

Auto Loader (Incremental Ingestion)

Auto Loader is Databricks' equivalent of Snowflake's Snowpipe — it incrementally processes new files as they arrive in cloud storage.

Mode	Description	FSx S3 AP Status
Directory Listing	Periodically lists directory to find new files	⚠️ Requires External Location (blocked)
File Notification	Uses S3 Event Notifications + SQS for real-time detection	❌ Not possible (FSx S3 AP doesn't support S3 Events)

Auto Loader supported formats (8 formats): JSON, CSV, Parquet, Avro, ORC, XML, TEXT, BINARYFILE.

FSx S3 AP latency context: Even if Directory Listing mode were unblocked, FSx S3 AP ListObjectsV2 latency is significantly higher than native S3 (tens of seconds to minutes for large directories). This would impact Auto Loader polling intervals and new-file detection speed. Plan for minutes-level detection latency, not seconds.

Concept Mapping: Snowflake ↔ Databricks

Snowflake Concept	Databricks Equivalent	FSx S3 AP (Snowflake)	FSx S3 AP (Databricks)
Storage Integration	Storage Credential	✅	✅
External Stage	External Location	✅	✅ (partial)
External Table	External Table	✅	❌ Blocked
Directory Table	External Volume	✅	❌ Blocked
Snowpipe	Auto Loader	⚠️ (no S3 Events)	❌ Blocked
COPY INTO	COPY INTO / Auto Loader	✅	❌ Blocked
`AWS_ACCESS_POINT_ARN`	`access_point` field	✅ (resolves all)	⚠️ (partial resolution)
Cortex Search (RAG)	Mosaic AI / MLflow	✅ (via COPY INTO)	⚠️ (boto3 + external)

Data Ingestion Alternatives for FSx for ONTAP (When Auto Loader Is Blocked)

Throughput constraint: All S3 AP operations are bounded by the FSx for ONTAP file system's provisioned throughput capacity (e.g., 128 MB/s in this validation environment). This throughput is shared with NFS/SMB workloads on the same file system. Plan ingestion windows and concurrent access accordingly.

Since Auto Loader requires External Location (currently blocked on FSx S3 AP), use these alternatives:

Method	Description	Latency	Governance
FPolicy → Lambda → S3 → Auto Loader	FPolicy detects file changes → Lambda copies to S3 → Auto Loader ingests	Seconds	✅ Full UC (on S3 copy)
AWS Glue ETL	Glue job reads from FSx S3 AP → writes to S3/Delta	Minutes	AWS-side
EMR Serverless	Spark job reads from FSx S3 AP → writes to S3/Delta	Minutes	AWS-side
AWS DataSync	Scheduled sync from FSx NFS → S3 bucket	Minutes-Hours	AWS-side
SnapMirror to S3	ONTAP-native replication to S3 bucket	Minutes	ONTAP-side

SnapMirror to S3 caveat: Object metadata in SnapMirror S3 targets differs from NFS file metadata. Validate schema compatibility and file naming conventions before using SnapMirror S3 as an ingestion path for analytics engines.

Recommended production pattern:

FSx for ONTAP ──FPolicy──▶ Lambda ──▶ S3 Bucket ──▶ Auto Loader ──▶ Delta Table (UC governed)

Iceberg interoperability note: Once data is in UC as a managed Delta or Iceberg table, external engines can access it via UC's Iceberg REST Catalog — enabling Athena, EMR, and Trino to query the same governed table without data duplication. This makes the DataSync → S3 → UC path a hub for multi-engine access.

AI Readiness Score

Pattern	Governance	Performance	AI Capability	Cost	Operational Simplicity	Overall
Athena + FSx S3 AP	★★★☆☆	★★★★☆	★☆☆☆☆ (SQL only)	★★★★★	★★★★★	3.6
Snowflake External Table	★★★★☆	★★★☆☆	★★★★☆ (Cortex AI)	★★★★★	★★★★☆	4.0
Staged to S3 → UC Table	★★★★★	★★★★★	★★★★★ (full Mosaic AI)	★★☆☆☆	★★☆☆☆	3.8
boto3 PoC (Databricks)	★☆☆☆☆	★★☆☆☆	★★★☆☆ (driver-only)	★★★★★	★★★☆☆	2.8
Bedrock KB + FSx S3 AP	★★★☆☆	★★★★☆	★★★★☆ (RAG)	★★★★☆	★★★★☆	3.8

Governance: UC lineage, tags, masking, row filters
Performance: Query latency, distributed processing
AI Capability: Breadth of AI/ML functions available
Cost: Storage efficiency, compute cost
Operational Simplicity: Setup, maintenance, pipeline complexity

Scoring methodology: Each dimension rated by the author based on validated evidence in this article series. This is not an official AWS assessment or certification. Scores reflect observed capabilities in one test environment.

Performance note: Performance scores reflect relative comparison within FSx S3 AP access patterns, not comparison with native S3 bucket performance. All patterns accessing FSx S3 AP have higher latency than equivalent native S3 operations.

How to use this score: Use Overall score as a starting point for pattern selection. Scores ≥ 4.0 indicate strong fit for governed production workloads. Scores 3.5–3.9 indicate viable paths with trade-offs. Scores < 3.0 indicate PoC-only paths requiring compensating controls.

When to choose which:

Choose Snowflake External Table (4.0) when governed AI on NAS data without copying is the priority
Choose Staged to S3 → UC Table (3.8) when maximum Databricks performance and full Mosaic AI are required (accepts data duplication cost)
Choose Bedrock KB (3.8) when AWS-native RAG with zero-copy on FSx is the primary requirement
Choose boto3 PoC (2.8) only for time-limited exploration with explicit approval; with compensating controls (see Compensating Controls section), governance risk can be partially mitigated for PoC scope. Post-expiration actions must be defined: terminate cluster, remove instance profile, archive evidence.

The Goal

Process unstructured data (images, documents, audio) stored on FSx for ONTAP from Databricks — without copying data to S3. FSx for ONTAP S3 Access Points should make this possible by exposing NFS/SMB file data via S3 API.

In Part 1, Athena worked cleanly in my validation using the official AWS tutorial pattern. Databricks, however, has multiple security layers that interact with S3 AP in unexpected ways.

Test Environment

I tested across two workspace configurations:

Runtime scope: Only DBR 17.3 LTS (Spark 4.0.0) was tested. This article does not compare DBR 16.x, 18.x, ML runtimes, GPU runtimes, or serverless compute. Runtime-level behavior may differ across versions and compute types. This article does not compare behavior across DBR versions or access modes beyond those listed in the test environment.

┌─────────────────────────────────────────────────────────────────────┐
│ Workspace 1: Databricks-managed VPC                                 │
│ - VPC created and managed by Databricks                             │
│ - Limited network control                                           │
│ - VPC Peering to FSx for ONTAP VPC                                  │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Workspace 2: Customer-managed VPC (same VPC as FSx for ONTAP)       │
│ - Full network control                                              │
│ - Direct connectivity to FSx for ONTAP (no peering needed)          │
│ - NAT Gateway for Databricks control plane                          │
└─────────────────────────────────────────────────────────────────────┘

Cluster modes tested:

Standard (Shared Access)
Dedicated (Single User) — provides sudo/root access
Dedicated with Instance Profile

All tests used DBR 17.3 LTS (Spark 4.0.0), ap-northeast-1.

Approach 1: Unity Catalog External Location

The Setup

The Databricks-governed path for S3 data access is to create a Storage Credential and External Location. I tested whether the same pattern could work with an FSx for ONTAP S3 Access Point.

# What I expected to work
files = dbutils.fs.ls("s3://<FSx-S3-AP-alias>/")

The Error

AccessDenied: User: arn:aws:sts::<ACCOUNT>:assumed-role/databricks-...-cross-account-role/
  databricks-unity-catalog-credential-<WORKSPACE_ID>
is not authorized to perform: s3:ListBucket on resource:
  "arn:aws:s3:<REGION>:<ACCOUNT>:accesspoint/<AP_NAME>"
because no session policy allows the s3:ListBucket action

Observed Boundary

Unity Catalog applies a session policy when it calls AssumeRole. This session policy acts as a permissions boundary — even if the IAM role has s3:* on *, the session policy restricts what the assumed session can do.

The evidence narrows the failure domain, but does not identify Databricks internal implementation details.

In this validation, the generated session policy behavior allowed access to a standard S3 bucket path but did not allow the FSx for ONTAP S3 Access Point ARN pattern:

arn:aws:s3:::bucket-name       ✅ Allowed
arn:aws:s3:::bucket-name/*     ✅ Allowed

But FSx for ONTAP S3 AP uses a different ARN format:

arn:aws:s3:<region>:<account>:accesspoint/<name>    ❌ Not in session policy

Proof

The same IAM role works fine for regular S3 buckets through Unity Catalog:

# This works — regular S3 bucket
dbutils.fs.ls("s3://my-workspace-storage-bucket/")
# SUCCESS

# This fails — FSx for ONTAP S3 Access Point
dbutils.fs.ls("s3://<FSx-S3-AP-alias>/")
# AccessDenied: no session policy allows...

Status

In my initial validation, this behaved like a platform boundary in Unity Catalog's generated session policy. I opened a support case to confirm whether S3 Access Point ARN patterns can be supported for external locations.

Before (access_point field not set) — Unity Catalog session policy blocks all S3 AP operations:

Without the access_point field, dbutils.fs.ls on the S3 AP alias returns UNAUTHORIZED_ACCESS. The session policy only allows standard S3 bucket ARNs.

Update (2026-05-24): `access_point` Field Resolves Session Policy

Critical Update (2026-05-26): Databricks Support subsequently confirmed that the access_point field was never released as a generally available feature and has been removed from documentation. The partial success described below is "a side effect of incomplete internal handling, not a supported code path." Unity Catalog External Locations do not currently support S3 Access Points. See the full support confirmation at the end of this section.

Databricks Support (May 2026) confirmed that Unity Catalog External Locations support an access_point field. Setting this field includes the S3 AP ARN in the generated session policy.

Configuration that works:

External Location:
  URL: s3://<FSx-S3-AP-alias>/
  Credential: <storage-credential-name>
  access_point: arn:aws:s3:<region>:<account>:accesspoint/<ap-name>

API call to set the field:

curl -X PATCH \
  https://<workspace>/api/2.1/unity-catalog/external-locations/<location-name> \
  -H "Authorization: Bearer <token>" \
  -d '{"access_point": "arn:aws:s3:<region>:<account>:accesspoint/<ap-name>"}'

What now works under UC governance:

Operation	Result	Notes
`dbutils.fs.ls("s3://<alias>/")`	✅	Top-level listing (287 items)
`dbutils.fs.head("s3://<alias>/file.txt")`	✅	Read file content
`spark.read.text("s3://<alias>/file.txt")`	✅	Spark read with explicit file path
`spark.read.csv("s3://<alias>/path/to/file.csv")`	✅	1000 rows, schema inferred

After (access_point field set) — Top-level listing succeeds, 287 items visible:

With the access_point field configured, dbutils.fs.ls at the top level returns 287 items from the FSx for ONTAP volume.

Sensor data read via Spark — 1000 rows with schema inference:

spark.read.csv with explicit file path successfully reads 1000 sensor readings with full schema inference (timestamp, machine_id, temperature_c, vibration_mm_s, pressure_bar, rpm, status, location).

What still does NOT work:

Operation	Result	Error
`dbutils.fs.ls("s3://<alias>/subdir/")`	❌	AccessDenied on getFileStatus
`spark.read.load("s3://<alias>/subdir/")`	❌	Forbidden (directory-level access)
`CREATE TABLE LOCATION 's3://<alias>/...'`	❌	UC_CLOUD_STORAGE_ACCESS_FAILURE
`dbutils.fs.cp` (PutObject)	❌	AccessDenied

Remaining blockers — Subdirectory listing and UC table creation fail:

Subdirectory dbutils.fs.ls returns UNAUTHORIZED_ACCESS. CREATE TABLE LOCATION fails with UC_CLOUD_STORAGE_ACCESS_FAILURE. Without a UC table, governance features (lineage, tags, fine-grained access control) cannot be applied.

Summary: Data is readable but not governable. The critical blocker is CREATE TABLE LOCATION failure, which prevents Unity Catalog governance from being applied to the data.

Key pattern: File-level read operations succeed (GetObject with explicit key). Directory-level operations (ListObjectsV2 with prefix, HeadObject on prefix) fail for subdirectories. This suggests the session policy scopes ListObjectsV2 to the root prefix only.

Implication: Explicit-path file read works, but without UC table creation, Unity Catalog governance features — lineage, fine-grained access control, governance tags, column masking, row filtering — cannot be applied. The data is technically readable through the External Location path but not registerable as a governed UC table. This limits the practical value for production governance use cases until the subdirectory listing and table creation issues are resolved.

Requirements for this path:

Customer-managed VPC workspace (same VPC as FSx for ONTAP)
External Location with access_point field set
Storage Credential IAM role with S3 AP permissions
NAT Gateway for control plane connectivity

Approach 2: NFS Mount (Managed VPC)

The Idea

If S3 AP doesn't work through Unity Catalog, mount the FSx for ONTAP volume directly via NFS.

The Setup

Created VPC Peering between Databricks-managed VPC and FSx for ONTAP VPC. Updated route tables and security groups.

The Result

%sh
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/2049' && echo "REACHABLE" || echo "NOT REACHABLE"
# NOT REACHABLE

NFS port (TCP 2049) is unreachable from Databricks-managed VPC, even with VPC Peering configured. From the customer-controlled routing perspective, route tables and FSx for ONTAP-side security groups were configured to allow NFS. However, cluster-side egress remained governed by the Databricks-managed environment, and NFS egress was not permitted.

Lesson

Databricks-managed VPC gives you limited network control. The egress rules on cluster instances are managed by Databricks, not by customer-added security group rules.

Approach 3: NFS Mount (Customer-managed VPC)

The Setup

Deployed a new workspace in the same VPC as FSx for ONTAP. No peering needed — direct L3 connectivity.

Network Verification (All Pass)

%sh
echo "TCP 2049 (NFS):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/2049' && echo "REACHABLE"
echo "TCP 111 (portmapper):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/111' && echo "REACHABLE"
echo "TCP 635 (mountd):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/635' && echo "REACHABLE"

TCP 2049 (NFS): REACHABLE ✅
TCP 111 (portmapper): REACHABLE ✅
TCP 635 (mountd): REACHABLE ✅

Note: The /dev/tcp test confirms TCP reachability. NFSv3 mountd may use TCP or UDP depending on configuration. The exact transport should be validated with rpcinfo if needed.

sudo Access (Dedicated Mode)

%sh
sudo whoami
# root ✅

NFS Client Installation and Export Verification

%sh
sudo apt-get install -y nfs-common
showmount -e 10.0.3.133

Export list for 10.0.3.133:
/vol1 (everyone) ✅

Everything looks perfect. Network connected, root access available, NFS exports visible. Let's mount:

The Mount Attempt

%sh
sudo mkdir -p /mnt/fsxn
sudo mount -t nfs -o nfsvers=3,nolock 10.0.3.133:/vol1 /mnt/fsxn

mount.nfs: access denied by server while mounting 10.0.3.133:/vol1

Wait, what? The server is showing the export to everyone, we have root access, the network is connected... why "access denied by server"?

The Investigation: Why NFS Mount Fails

This is where it gets interesting. The error message says "access denied by server" — but is it really the server?

Step 1: Verify ONTAP Export Policy

Via ONTAP REST API (accessible from the same cluster):

{
  "rules": [{
    "clients": [{"match": "0.0.0.0/0"}],
    "ro_rule": ["any"],
    "rw_rule": ["any"],
    "superuser": ["any"],
    "protocols": ["any"]
  }]
}

The export policy is maximally permissive — all clients, all protocols, read-write, superuser. ONTAP is not denying access.

Important: This permissive export policy was used only to eliminate ONTAP export restrictions as a variable during troubleshooting. It is not a production recommendation. For production, restrict: client CIDR, protocol, read/write rule, superuser mapping, and volume/junction path scope.

ONTAP Production Hardening Checklist

For production deployments, harden the ONTAP configuration:

[ ] Restrict export policy client CIDR to known analytics subnets only
[ ] Avoid rw=any and superuser=any — use explicit security flavors
[ ] Map S3 Access Point file system user to a least-privilege NAS user (not root/UID 0)
[ ] Validate NFS/SMB ACL behavior when S3 AP is active
[ ] Validate S3 API access against file-level permissions
[ ] Capture ONTAP audit evidence where required (ONTAP FPolicy)
[ ] Document junction path and volume scope
[ ] Isolate analytics volumes from production NFS/SMB workloads if throughput contention is a concern

Step 2: strace the mount command

%sh
sudo strace -f -e trace=mount mount -t nfs -o nfsvers=3,nolock 10.0.3.133:/vol1 /mnt/fsxn 2>&1

mount.nfs: trying 10.0.3.133 prog 100003 vers 3 prot TCP port 2049
mount.nfs: trying 10.0.3.133 prog 100005 vers 3 prot UDP port 635
mount("10.0.3.133:/vol1", "/mnt/fsxn", "nfs", ...) = -1 EACCES (Permission denied)
mount.nfs: mount(2): Permission denied

Key finding: mount.nfs successfully connects to both NFS (port 2049) and mountd (port 635), but the mount() syscall returns EACCES. The denial happens at the kernel level, not at the server.

TCP/UDP note: The initial reachability check used /dev/tcp, confirming TCP reachability. During the actual mount attempt, mount.nfs tried mountd over UDP as shown in the strace output. This is not a contradiction — NFSv3 mountd may use either transport. For production troubleshooting, use rpcinfo and packet capture to confirm the actual protocol and port mapping.

Step 3: Manual NFS RPC Calls (User-space)

To prove ONTAP is granting access, I performed manual NFS RPC calls using Python sockets:

import socket, struct

# MOUNT RPC (program 100005, version 3, procedure MNT)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(5)
sock.sendto(mount_rpc_packet, ("10.0.3.133", 635))
response = sock.recv(4096)
# Parse: status=0 (MNT3_OK), file_handle=44 bytes
print("MOUNT RPC: SUCCESS ✅")

# NFS3 FSINFO, GETATTR, READDIRPLUS — all succeed
print("NFS3 FSINFO: SUCCESS ✅")
print("NFS3 GETATTR: SUCCESS ✅")
print("NFS3 READDIRPLUS: SUCCESS ✅")

All NFS operations succeed at user-space level. ONTAP grants full access. The problem is not the server.

Step 4: tmpfs Mount Test

%sh
sudo mount -t tmpfs tmpfs /tmp/test_mount && echo "SUCCESS" || echo "FAILED"

SUCCESS ✅

The mount() syscall itself is allowed. Only NFS filesystem type is blocked.

Step 5: Seccomp Status

%sh
cat /proc/self/status | grep Seccomp

Seccomp:        2
Seccomp_filters:        1

Seccomp: 2 = BPF filter mode active.

The Conclusion

┌─────────────────────────────────────────────────────────────────┐
│ Evidence Chain:                                                 │
│                                                                 │
│ 1. Network connectivity      → ✅ All NFS ports reachable       │
│ 2. ONTAP export policy       → ✅ 0.0.0.0/0, rw=any, su=any     │
│ 3. NFS RPC (user-space)      → ✅ All operations succeed        │
│ 4. mount() with type="nfs"   → ❌ EACCES                        │
│ 5. mount() with type="tmpfs" → ✅ Success                       │
│ 6. Seccomp                   → Active (BPF filter mode)         │
│                                                                 │
│ Conclusion: The evidence points to a local platform security    │
│ boundary, likely seccomp filtering or an equivalent runtime     │
│ restriction, blocking the NFS mount path.                       │
└─────────────────────────────────────────────────────────────────┘

The error message "access denied by server" is misleading. The mount.nfs program interprets the kernel's EACCES as a server-side denial, but strace reveals the truth: the denial is local.

If sharing this finding: This is not a Databricks compatibility verdict. It is a layer-by-layer validation of observed boundaries in one environment (DBR 17.3 LTS, ap-northeast-1). Platform behavior may differ across runtime versions, access modes, and configurations.

Important: Because Databricks does not publicly document this specific syscall/filesystem-type behavior, treat this as validation evidence rather than an official platform statement until confirmed by Databricks Support.

All Mount Options Tested

Options	Result
`-o nfsvers=3,nolock`	access denied
`-o nfsvers=4.1`	access denied
`-o nfsvers=3,nolock,resvport`	access denied
`-o nfsvers=3,nolock,noresvport`	access denied
`-o sec=sys`	access denied
(no options)	access denied
tmpfs	SUCCESS

Evidence Matrix

Layer	Evidence	Result	Interpretation
Network	TCP 2049 / TCP 111 / TCP 635 reachable	✅ Pass	Network path exists between cluster and FSx for ONTAP
ONTAP export	Export policy allows 0.0.0.0/0, rw=any, su=any	✅ Pass	Export policy is not the blocker
NFS server RPC	MOUNT / FSINFO / GETATTR / READDIRPLUS succeed via user-space	✅ Pass	ONTAP grants NFS operations to this client
Local syscall	`mount(type=nfs)` returns EACCES	❌ Fail	Evidence points to a local runtime boundary affecting kernel NFS mount
Local syscall control	`mount(type=tmpfs)` succeeds	✅ Pass	`mount()` syscall is not universally blocked
Runtime security	Seccomp mode 2 observed in the tested process context	Observed	Runtime filtering may restrict NFS-specific mount
Unity Catalog S3	External Location test on S3 AP ARN → AccessDenied	❌ Fail	Session policy does not allow S3 AP ARN pattern
Instance Profile S3	boto3 GetObject on S3 AP → Success	✅ Pass	IAM role itself has correct permissions

showmount -e confirms that the export is visible through mountd. It does not guarantee that the local runtime allows the kernel NFS mount operation to complete. showmount -e validates NFS export visibility only. It does not validate the file system user identity associated with the S3 Access Point. For S3 AP authorization, record the associated UNIX or Windows identity and verify file-level permissions separately — these are independent authorization paths.

FSx for ONTAP S3 AP Authorization Path

FSx for ONTAP S3 Access Points use a dual-layer authorization model that combines AWS IAM permissions with file system-level permissions:

Layer 1 — S3-side authorization:

IAM identity-based policy (caller's permissions)
S3 Access Point resource policy
VPC endpoint policy (if applicable)
SCP / RCP (if applicable)

Layer 2 — FSx for ONTAP-side authorization:

File system user associated with the access point
UNIX mode-bits / NFSv4 ACLs (for UNIX security style volumes)
Windows ACLs (for NTFS security style volumes)

In the Databricks validation, the failure occurs before Layer 2 — Unity Catalog's generated session policy restricts the assumed role session at the S3 API level, preventing the request from reaching FSx for ONTAP-side authorization. The Instance Profile + boto3 path bypasses Unity Catalog's session policy, allowing both layers to be evaluated normally.

For production, both layers must be configured with least-privilege. A permissive file system user (e.g., root / UID 0) combined with a broad IAM policy creates an overly permissive access path.

Approach 4: Instance Profile + boto3

The Setup

Customer-managed VPC workspace, Dedicated cluster with an Instance Profile attached.

IMDS Access

import urllib.request, json

# IMDSv2 token
req = urllib.request.Request(
    "http://169.254.169.254/latest/api/token",
    headers={"X-aws-ec2-metadata-token-ttl-seconds": "21600"},
    method="PUT"
)
token = urllib.request.urlopen(req, timeout=2).read().decode()
print(f"Token: {token[:20]}...")  # ✅ Success

Regular S3 Access

import boto3
s3 = boto3.client("s3", region_name="ap-northeast-1")
buckets = s3.list_buckets()
print(f"ListBuckets: {len(buckets['Buckets'])} buckets")  # ✅ 58 buckets

FSx for ONTAP S3 AP Access

response = s3.list_objects_v2(
    Bucket="<FSx-S3-AP-alias>",
    MaxKeys=10
)
print(f"Objects: {response['KeyCount']}")  # ✅ Works

This works. Instance Profile credentials bypass Unity Catalog's session policy entirely. boto3 talks directly to the S3 API with the EC2 instance's IAM role.

Governance warning
Instance Profile + boto3 is a pragmatic workaround for PoC and controlled experiments. It bypasses Unity Catalog governance, including fine-grained access control, lineage, and centralized data access auditing. Do not treat this as a production lakehouse governance pattern without a separate security and compliance review. Databricks recommends Unity Catalog external locations as the standard governed access mechanism.

Scope note
The Instance Profile + boto3 sample above runs on the driver node only (single-node PoC pattern). Whether the same credential, network path, and concurrency behavior applies to Spark executors in a multi-node cluster requires separate validation.

Approach 5: S3 AP + Instance Profile (Managed VPC with VPC Peering)

The Hypothesis

If Instance Profile + boto3 works on a Customer-managed VPC (Approach 4), does it also work from a Databricks-managed VPC with VPC Peering to the FSx for ONTAP VPC? This would validate whether the S3 Gateway Endpoint in the Databricks-managed VPC can route S3 AP requests to the FSx for ONTAP backend.

The Setup

Databricks-managed VPC (vpc-060209589cbe4c298, CIDR: 10.53.0.0/16)
FSx for ONTAP VPC (vpc-0ae01826f906191af, CIDR: 10.0.0.0/16)
VPC Peering: pcx-02167ddf900a30782 (active)
Route tables: updated in both directions
FSx for ONTAP security group: allows all traffic (0.0.0.0/0)
S3 Gateway Endpoint: vpce-020b59ab4da0b44b8 (full access policy)
Cluster: m5.large × 3, DBR 17.3 LTS, Dedicated mode, Instance Profile attached

The Result

{
  "dns_resolution": {"success": true, "ip": "52.219.151.110"},
  "vpc_peering_443": {"success": false, "result_code": 11},
  "vpc_peering_nfs": {"success": false, "result_code": 11},
  "s3_ap_access": {"success": false, "error": "Read timeout"},
  "imds": {"success": true}
}

Analysis

Layer	Result	Interpretation
DNS resolution	✅	S3 AP alias resolves to S3 endpoint IP (52.219.x.x)
VPC Peering (TCP 443)	❌	FSx for ONTAP management IP unreachable — egress blocked
VPC Peering (NFS 2049)	❌	NFS port unreachable — egress blocked
S3 AP via S3 Gateway Endpoint	❌	Read timeout — S3 service reachable but FSx for ONTAP backend connection fails
IMDS / Instance Profile	✅	Credentials available and valid

Key finding: Even with VPC Peering established, routes configured, and security groups permissive, the Databricks-managed VPC's egress restrictions block connectivity to the FSx for ONTAP backend. The S3 Gateway Endpoint routes requests to the S3 service, but FSx for ONTAP S3 AP requires the S3 service to reach the FSx for ONTAP file system — which is in a different VPC from the Databricks cluster. The S3 service-side routing to the FSx for ONTAP backend is not affected by customer-side VPC Peering.

Important: This result confirms that FSx for ONTAP S3 AP access requires the requesting service (Databricks cluster) to be in the same VPC as the FSx for ONTAP file system, or to use a network configuration where the S3 service can reach the FSx for ONTAP backend. VPC Peering between the requester VPC and the FSx for ONTAP VPC does not help because S3 AP requests are routed through the S3 service, not directly to the FSx for ONTAP IP.

Lesson

S3 AP requests do not traverse VPC Peering. They are routed through the S3 service endpoint. For FSx for ONTAP S3 AP to work, the S3 service must be able to reach the FSx for ONTAP file system's internal endpoint. This is handled by AWS internally when the request originates from the same region, but the Databricks-managed VPC's egress restrictions appear to interfere with this path.

Customer-managed VPC (same VPC as FSx for ONTAP) remains the only validated path for Instance Profile + boto3 access to FSx for ONTAP S3 AP from Databricks.

IMDS Access Matrix

Cluster Mode	Workspace Type	IMDS	boto3 S3	boto3 S3 AP
Standard (Shared)	Managed VPC	❌	❌	❌
Dedicated	Managed VPC	❌	❌	❌
Dedicated	Customer VPC	❌	❌	❌
Dedicated + Instance Profile	Managed VPC (VPC Peering)	✅	⚠️	❌
Dedicated + Instance Profile	Customer VPC	✅	✅	✅

Row 4 note: IMDS works and Instance Profile credentials are valid, but S3 AP access times out because the Databricks-managed VPC egress restrictions block FSx for ONTAP backend connectivity. Regular S3 bucket access was not tested with a permissive policy (AccessDenied was due to intentionally scoped IAM policy, not network).

IMDS is blocked on all configurations except Dedicated mode with an explicitly registered Instance Profile on a Customer-managed VPC workspace.

Complete Results Summary

#	Approach	Result	Blocker
1	UC External Location + dbutils.fs (without `access_point` field)	❌	Generated session policy did not allow S3 AP ARN
1b	UC External Location + `access_point` field (file-level read)	✅	Top-level ls, head, spark.read with explicit path all work
1c	UC External Location + `access_point` field (subdirectory ls)	❌	Prefix-based ListObjectsV2 still blocked for subdirectories
1d	UC External Location + CREATE TABLE LOCATION	❌	UC_CLOUD_STORAGE_ACCESS_FAILURE during internal validation
2	UC External Location + Spark read (directory)	❌	Same prefix-level access issue
3	NFS mount (Managed VPC, VPC Peering)	❌	Egress blocked (port 2049)
4	NFS mount (Customer VPC, Dedicated)	❌	NFS mount blocked by seccomp by design (confirmed by Databricks Support)
5	boto3 (Managed VPC, no Instance Profile)	❌	IMDS blocked
6	boto3 (Customer VPC, no Instance Profile)	❌	IMDS blocked
7	Instance Profile + boto3 (Customer VPC)	✅	Works (bypasses UC governance)
8	NFS RPC user-space (Customer VPC)	✅	Works but impractical for production
9	No Isolation Shared mode	❌	Legacy access mode; not pursued
10	S3 AP + Instance Profile + boto3 (Managed VPC, VPC Peering)	❌	Managed VPC egress blocks FSx for ONTAP backend connectivity

Governance Impact Summary

Documentation status (Updated 2026-05-26): Databricks Support confirmed that the access_point field was never released as GA and has been removed from documentation. Unity Catalog External Locations do not currently support S3 Access Points as storage targets. The partial success observed is a side effect, not a supported code path. Feature gap reported to UC engineering — no timeline available.

Access path	Governance model	Auditability	Production suitability
Unity Catalog External Location	Centralized UC governance (fine-grained, lineage)	High (if supported)	Preferred, but blocked in this validation
Instance Profile + boto3	EC2 IAM role based	AWS-side logs possible if enabled; UC lineage not captured	PoC only unless separately approved
Kernel NFS mount	Filesystem / OS level	Outside UC governance	Not viable in this validation
User-space NFS RPC	Custom application path	Custom logging required	Experimental only
Athena + FSx for ONTAP S3 AP	IAM / S3 AP / Athena workgroup	AWS-side evidence possible	Best current read-only SQL analytics fit
Bedrock Knowledge Bases + FSx for ONTAP S3 AP	IAM / S3 AP / Bedrock Knowledge Base role / guardrails where used	AWS-side evidence possible	AWS-documented RAG / GenAI path; validated with permission-aware retrieval in related series
Glue / EMR Serverless + FSx for ONTAP S3 AP	IAM / S3 AP / Glue / EMR job roles	AWS-side evidence possible	Validated ETL / Spark path in this broader series where verification-pack evidence is available; validate production write-back semantics separately

AWS-side audit events, such as CloudTrail data events where enabled and applicable, may show S3 API access by the instance profile, but they do not replace Unity Catalog lineage, table-level privileges, or centralized Databricks governance controls.

MLOps Boundary

Using boto3 to read objects from FSx for ONTAP S3 AP does not automatically make the downstream ML workflow governed.

If the data retrieved via Instance Profile + boto3 is used for ML or GenAI:

Register derived datasets in governed storage (Unity Catalog managed location)
Track experiments with MLflow
Register models in Unity Catalog where applicable
Document source data access path (S3 AP alias, prefix, timestamp)
Record whether training data lineage is captured or externalized
Ensure the ML compute uses an access mode compatible with Unity Catalog governance

Models in Unity Catalog provides centralized access control, auditing, lineage, and model discovery across workspaces. If the PoC data path bypasses UC, the model lifecycle should still be governed through UC model registry.

AI / RAG Data Readiness Checklist

If the FSx for ONTAP S3 AP data is intended for AI, RAG, or GenAI pipelines:

[ ] Are documents classified by sensitivity (PHI, PII, financial, internal, public)?
[ ] Are file-level permissions preserved or re-modeled for the AI pipeline?
[ ] Is metadata available for filtering and retrieval (file type, date, owner)?
[ ] Is freshness requirement defined (real-time, daily, weekly)?
[ ] Is read-only access sufficient, or does the pipeline need write-back?
[ ] Is human review required for generated output before downstream use?
[ ] Is permission-aware retrieval required (user A sees only their authorized documents)?

If permission-aware retrieval is required, define one of:

Enforce at source access path — use per-user or per-group S3 Access Points with scoped file system users
Re-model permissions in metadata index — extract file-level ACLs into a searchable metadata store and filter at query time
Filter retrieval results by user/group claims — apply post-retrieval filtering based on authenticated user identity
Do not proceed until authorization model is validated and approved by security owner

Instance Profile + boto3 approval requirements (for regulated workloads):

Data owner approval
Security owner approval
Platform owner approval
Compliance reviewer approval (if regulated data involved)
Defined: allowed prefix, allowed operations, logging requirements, expiration date
Approval record location (where the decision is stored)
Review / expiration date (when the approval must be re-evaluated)
Incident escalation contact

For regulated workloads, do not use Instance Profile + boto3 for:

Patient-facing responses or clinical decision support
Financial decision automation
Unreviewed access to regulated datasets
Writeback to source-controlled data locations
Workloads requiring Unity Catalog lineage

Decision Matrix

Requirement	Recommended path today	Notes	Next validation action
SQL query on structured files	Athena + FSx for ONTAP S3 AP (Part 1)	Verified, simple, governed	Scale test with production data sizes
RAG / GenAI over NAS documents	Bedrock Knowledge Bases + FSx for ONTAP S3 AP	AWS-documented tutorial	Validate retrieval accuracy, permission-aware filtering, and sync freshness
ETL pipeline on NAS data	Glue or EMR Serverless + FSx for ONTAP S3 AP	Validated in this broader series where verification-pack evidence is available	Validate throughput impact and production write-back semantics
Serverless file processing	Lambda + FSx for ONTAP S3 AP	AWS-documented tutorial	Validate concurrency and throughput for your workload
Databricks governance with Unity Catalog	Wait for platform support	UC session policy currently blocks S3 AP ARN in my validation	Monitor Databricks support case response
Databricks unstructured data PoC	Dedicated cluster + Instance Profile + boto3	Works, but bypasses UC governance	Validate executor-scale behavior separately
Production Databricks lakehouse tables	Use supported cloud storage (S3 bucket)	Required for Delta write semantics	N/A — use standard pattern
Databricks distributed processing over FSx for ONTAP S3 AP	Not validated yet	Driver-only boto3 success does not prove executor-scale behavior	Test with multi-node cluster and Spark mapPartitions
Enterprise read-only analytics	Athena / Glue / EMR Serverless / FSx for ONTAP S3 AP	Best current fit for AWS-native path	Production workload isolation test
Video streaming from NAS	CloudFront + FSx for ONTAP S3 AP	AWS-documented tutorial	Validate caching and latency for your content

This article does not recommend bypassing Unity Catalog for production governed lakehouse workloads. The Instance Profile + boto3 path is documented because it worked in a controlled validation environment, not because it is the preferred governance model.

Architecture Decision Guidance

Databricks remains the recommended platform for curated lakehouse workloads, governed Delta tables, ML pipelines, and multi-step data engineering. FSx for ONTAP S3 AP should be treated as a source integration boundary that may require staging, validation, or an alternate read path depending on governance requirements.

Use Databricks when:

Data is already in supported object storage (S3 bucket)
Delta Lake write semantics are required (INSERT, MERGE, OPTIMIZE, VACUUM)
Unity Catalog lineage and fine-grained governance are mandatory
Large-scale Spark processing is required
ML/AI workloads need integrated compute

Use AWS-native services + FSx for ONTAP S3 AP when:

The primary requirement is read-only SQL analytics over NAS data → Athena (validated in Part 1)
RAG / GenAI over enterprise documents → Bedrock Knowledge Bases (AWS-documented path)
ETL pipelines reading/transforming NAS data → Glue (validated in this broader series where verification-pack evidence is available)
Spark-scale processing without persistent clusters → EMR Serverless (validated in this broader series where verification-pack evidence is available)
Serverless file processing (thumbnails, text extraction, transcription) → Lambda (AWS-documented path)
Video streaming from NAS → CloudFront (AWS-documented path)
External partner file exchange → Transfer Family (AWS-documented path)
BI and AI-assisted analytics → QuickSight candidate path, typically via Athena or Glue Catalog
Source data copy should be minimized
Workload isolation and governance can be validated with AWS-side controls
Serverless, pay-per-query or pay-per-invocation cost model is preferred

Use controlled boto3 PoC only when:

The workload is exploratory and time-limited
Unity Catalog lineage is not required for the PoC scope
Explicit approval is obtained from data owner, security owner, and platform owner
Compensating controls are defined and documented

FSx for ONTAP Sizing Considerations

Before selecting an analytics engine, validate FSx for ONTAP-side capacity:

Throughput capacity — S3 API throughput is bounded by the FSx for ONTAP file system's provisioned throughput
Expected S3 API request rate — high-frequency small object reads may hit IOPS limits
File count and average object size — large directories with many small files may increase listing latency
Prefix layout — flat vs hierarchical prefix design affects listing performance
NFS/SMB production workload window — analytics queries share throughput with existing file workloads
Snapshot / backup / replication schedule — SnapMirror and backup operations consume throughput
Isolation strategy — consider a dedicated volume or SVM for analytics access to avoid contention

Delta Lake production workloads require more than object read access. They require validated behavior for transaction log writes, atomic commit assumptions, concurrent writers, checkpointing, recovery, and lifecycle operations. This article does not validate FSx for ONTAP S3 AP for Delta write-path semantics.

Compensating Controls for Controlled boto3 PoC

If Instance Profile + boto3 is approved for a controlled PoC, define:

Dedicated cluster only (no shared compute)
Single-purpose instance profile (not reused across workloads)
Least-privilege S3 Access Point policy (specific prefix only)
Read-only permissions by default
Allowed prefix list (explicitly documented)
CloudTrail data event coverage where enabled and applicable
Notebook/job owner (named individual)
Approval expiration date
No production writeback
No regulated data unless separately approved with compensating controls

Recommended Databricks-side controls:

Restrict instance profile usage to an approved group via workspace admin settings
Enforce dedicated access mode through cluster policy
Restrict cluster creation permissions to approved users
Tag PoC clusters with owner, approval ID, and expiration date
Disable or terminate clusters after approval expiration
Review workspace audit logs for cluster and instance profile usage

Post-expiration mandatory actions:

Terminate all PoC clusters using the instance profile
Remove the instance profile from workspace admin settings
Archive all evidence (notebooks, logs, results) to approved storage
Update approval record with completion date and findings
Confirm no residual access paths remain (audit workspace settings)

Data Protection Considerations

FSx for ONTAP S3 AP exposes access to file data; it does not replace ONTAP volume-level protection. When analytics workloads access source data via S3 AP, validate:

Snapshot schedule impact — analytics reads do not conflict with scheduled snapshots, but heavy write-back could
SnapMirror replication policy — source volume replication continues regardless of S3 AP access
Backup window vs analytics query window — concurrent backup and analytics may compete for throughput
Write-back isolation — analytics results should be written to a separate volume or prefix, not the source-of-record volume
Recovery behavior — if analytics workload reads during a failover event, understand the RPO/RTO implications

ONTAP S3 NAS bucket data is protected by volume-level SnapMirror asynchronous replication, not by S3-level replication. Plan DR at the volume level.

Discovery Questions for Partners

When a customer asks about Databricks + FSx for ONTAP S3 Access Points:

Are the target files currently stored on NFS, SMB, or both?
Is the workload read-only analytics, unstructured object processing, or Delta write?
Is Unity Catalog lineage mandatory for this use case?
Is this a regulated dataset (PHI, PII, financial)?
Can the PoC run with a dedicated instance profile and limited prefix?
What is the required concurrency and data size?
Is executor-scale Spark processing required, or is driver-only sufficient?
What rollback action is acceptable if FSx for ONTAP throughput impact is observed?
Who approves non-Unity Catalog access paths?
What evidence is required for security review?

Troubleshooting Playbook

When Databricks access to FSx for ONTAP S3 AP fails, isolate one layer at a time:

IAM — Can the instance profile call s3:ListBucket on the S3 AP ARN? Can it call s3:GetObject?
Unity Catalog — Does the same role work for a standard S3 bucket? Does it fail only for the FSx for ONTAP S3 AP ARN?
Network — Is the workspace customer-managed or Databricks-managed? Can the cluster reach NFS TCP 2049? Are route tables and security groups correct?
NFS server — Does showmount -e work? Does the ONTAP export policy allow the client?
Local runtime — Does strace show mount() returning EACCES? Does tmpfs mount succeed? Does user-space NFS RPC succeed?
Workaround — Does Dedicated + Instance Profile + boto3 work? Is bypassing Unity Catalog acceptable for this PoC?

Known Failure Signatures

Symptom	Likely layer	Next step
`no session policy allows s3:ListBucket`	Unity Catalog session policy	Compare regular S3 bucket vs FSx for ONTAP S3 AP with the same role
TCP 2049 unreachable	Network / managed VPC boundary	Test from customer-managed VPC
`mount.nfs: access denied by server` with `mount()` EACCES in strace	Local runtime restriction	Capture strace and `/proc/self/status` seccomp output
boto3 `NoCredentialsError`	Instance profile / IMDS blocked	Verify cluster mode is Dedicated and instance profile is registered
boto3 `ReadTimeoutError` on S3 AP	FSx for ONTAP backend or VPC endpoint routing	Test with a fresh SVM/volume to isolate; check FSx for ONTAP CPU utilization
boto3 `ReadTimeoutError` on S3 AP from Managed VPC (IMDS works)	Managed VPC egress restriction blocking FSx for ONTAP backend	Deploy in Customer-managed VPC (same VPC as FSx for ONTAP); VPC Peering does not resolve this
Driver-only boto3 works, but Spark job fails	Executor credential/network path	Validate credentials, routing, and concurrency from executors separately

What This Article Does Not Conclude

This article does not conclude that Databricks cannot ever support FSx for ONTAP S3 AP. It documents the behavior observed in one validated environment and identifies the platform boundaries that need vendor confirmation or additional support.

What to Tell Stakeholders

Current recommendation:

Use AWS-documented native service paths where they match the workload: Athena for SQL, Bedrock Knowledge Bases for RAG/GenAI, Glue or EMR Serverless for ETL/Spark, Lambda for serverless file processing, CloudFront for streaming, and Transfer Family for partner file exchange
Treat Athena as the validated read-oriented SQL path in Part 1. Treat Glue / EMR Serverless as validated ETL / Spark paths only where corresponding verification-pack evidence is available.
Treat Bedrock Knowledge Bases, Lambda (file processing), CloudFront, and Transfer Family as AWS-documented candidate paths that still require workload-specific validation
Use Databricks + Instance Profile + boto3 only for controlled PoC or unstructured data experiments
Do not position Unity Catalog + FSx for ONTAP S3 AP as production-ready until the session policy supports S3 Access Point ARN patterns
Do not rely on kernel NFS mounts inside Databricks until the platform explicitly supports this path
For Delta Lake production tables, continue to use supported object storage patterns

This validation should be used to guide architecture selection, not to disqualify Databricks from lakehouse workloads.

This validation should not be used to compare AWS-native services and Databricks as competing platforms. AWS-native services (Athena, Bedrock, Glue, EMR Serverless, Lambda) each have AWS-documented integration paths with FSx for ONTAP S3 AP — some validated in this series, others requiring workload-specific validation. Databricks is strong for governed lakehouse, Delta, ML, and production-scale data engineering workloads. The right choice depends on the access pattern, governance requirement, and workload type.

Key contributions of this validation:

Identified the root cause of NFS mount failure (seccomp BPF filter, not server-side denial) via strace analysis
Discovered the access_point field on External Location (via Databricks Support) that partially resolves the session policy
Proved that file-level read under UC governance is possible (1000 rows, schema inference)
Mapped the complete evidence chain: network → ONTAP → NFS RPC → kernel → seccomp
Established that Customer-managed VPC (same VPC as FSx) is the only validated network path
Provided a reusable troubleshooting playbook for future S3 AP integration attempts

Lessons Learned

1. "S3-compatible" ≠ "works everywhere S3 works"

FSx for ONTAP S3 AP is S3-compatible at the API level, but platform security layers (session policies, VPC restrictions) may not recognize the ARN format. S3 API compatibility and platform-integrated S3 governance are different things.

2. Error messages can be misleading

mount.nfs: access denied by server made me spend hours checking ONTAP export policies. The real issue was a local runtime restriction. Always use strace when mount fails unexpectedly.

3. Platform security boundaries are not always documented

You discover these boundaries by hitting them. The troubleshooting playbook above can save you time.

4. Customer-managed VPC is essential for storage integration

If you need to connect Databricks to anything beyond standard S3 buckets, deploy in a Customer-managed VPC. Databricks-managed VPC provides limited customer control over cluster networking compared with a customer-managed VPC.

This was further confirmed by testing S3 AP access from a Databricks-managed VPC with VPC Peering: even with VPC Peering active, routes configured, security groups permissive, and a S3 Gateway Endpoint present, S3 AP requests to FSx for ONTAP timed out. The Databricks-managed VPC egress restrictions block not only direct IP communication but also S3 AP backend connectivity.

S3 AP routing note: S3 AP requests are routed through the S3 service endpoint, not directly to the FSx for ONTAP IP. VPC Peering between the requester VPC and the FSx for ONTAP VPC does not help because the S3 service needs internal connectivity to the FSx for ONTAP file system. Customer-managed VPC (same VPC as FSx for ONTAP) is the only validated path.

Databricks Control Plane (SaaS)
        ^
        | NAT Gateway (required outbound)
        |
Databricks Cluster ENI (Customer VPC, private subnet)
        |
        | Private VPC routing (no internet required)
        v
FSx for ONTAP ENI / SVM (same VPC, private subnet)

For the Databricks Support Case Packet, include network evidence: cluster subnet ID, FSx for ONTAP subnet ID, route table IDs, security group rules, and DNS resolution for FSx for ONTAP endpoint.

5. Instance Profile is a pragmatic PoC workaround

Use Instance Profile + boto3 as a controlled PoC workaround. Do not use it as a substitute for Unity Catalog governance without a formal security review.

6. Always isolate variables when troubleshooting

When FSx for ONTAP S3 AP wasn't responding, I created a new SVM and volume to isolate the issue. This confirmed the problem was SVM-specific rather than a platform-wide limitation.

7. Negative validation creates value

A failed integration path can still create value when it prevents the wrong production architecture. This validation helps teams avoid assuming S3 API compatibility equals platform governance compatibility, choose the right engine for the right access pattern, and reduce time spent on ambiguous troubleshooting.

Databricks Support Case Packet

If you open a support case with Databricks, include:

Workspace type: Databricks-managed VPC or customer-managed VPC
Cluster access mode and DBR version
IAM role / instance profile configuration
Unity Catalog storage credential and external location configuration
Full AccessDenied error message (including the ARN and "no session policy" text)
S3 AP ARN and alias format
Network test results for NFS ports (TCP 2049, TCP 111, TCP 635)
strace output showing mount() EACCES
/proc/self/status showing seccomp mode
User-space NFS RPC success evidence (if applicable)
Instance Profile boto3 success evidence (if applicable)
showmount -e output (confirms export visibility)
tmpfs mount success evidence (proves mount syscall itself is allowed)

Use Case Fit Matrix

When this article says "validated in this broader series," it refers to evidence captured in the linked verification-pack or related articles, not to Databricks-specific validation in this Part 2 article.

Use case	Best current path	Why
SQL analytics on structured NAS files	Athena + FSx for ONTAP S3 AP	Verified read-oriented path with AWS-side governance controls, serverless
Enterprise IT RAG over documents	Bedrock Knowledge Bases + FSx for ONTAP S3 AP	AWS-documented tutorial; also validated in related series with permission-aware retrieval
ETL / data transformation	Glue or EMR Serverless + FSx for ONTAP S3 AP	Validated in this broader series where verification-pack evidence is available; validate production write-back semantics separately
Serverless file processing (thumbnails, OCR, transcription)	Lambda + FSx for ONTAP S3 AP	AWS-documented tutorial; validate for your workload
Large-scale Spark ETL	EMR Serverless + FSx for ONTAP S3 AP or standard S3 bucket	Validated in this series; Databricks executor-scale not validated on S3 AP
Production Delta Lake tables	Supported object storage (S3 bucket)	Required for Delta write semantics and UC governance
Unstructured data experimentation (Databricks)	Instance Profile + boto3 PoC	Works in driver-only pattern, needs governance review
Video streaming from NAS	CloudFront + FSx for ONTAP S3 AP	AWS-documented tutorial; validate caching, latency, and file size for your content
External partner file exchange	Transfer Family + FSx for ONTAP S3 AP	AWS-documented path; also validated in related series; validate file operation limitations (rename, append, upload size)
Lightweight serverless analytics	DuckDB Lambda + FSx for ONTAP S3 AP	Planned Part 3 validation; candidate for lightweight, low-idle-cost analytics
BI / dashboarding over NAS data	Candidate: QuickSight via Athena or Glue Catalog	AWS positions BI as a candidate use case; validate whether access path is Athena-backed or catalog-mediated

Cost Model Considerations

Engine	Primary cost driver	Best fit
Athena	Data scanned (per TB)	Occasional SQL queries, serverless
Bedrock Knowledge Bases	Model invocation + embedding + retrieval	RAG / GenAI over enterprise documents
Glue	DPU-hours	ETL pipelines, data transformation
Databricks	DBU + cloud compute instance hours	Lakehouse pipelines, ML, Delta workloads
EMR Serverless	vCPU / memory × runtime duration	Spark ETL without persistent clusters
Lambda + DuckDB	Invocation duration × memory	Lightweight serverless analytics, event-driven
CloudFront	Data transfer + requests	Video/media streaming from NAS

Cost comparison is not the focus of this article. Each engine has a fundamentally different pricing model. Databricks provides compute policies to control cluster creation, instance types, auto-termination, and cost-related attributes. For cost optimization, evaluate based on workload pattern (interactive vs batch, frequency, data volume) rather than unit price alone.

Partner / Customer Conversation Guide

If a customer asks whether Databricks can directly process FSx for ONTAP S3 Access Point data:

AWS-native service paths such as Athena, Bedrock Knowledge Bases, Glue, EMR Serverless, Lambda, CloudFront, and Transfer Family have AWS-documented integration patterns with FSx for ONTAP S3 AP. In this series, Athena (Part 1), Glue, and EMR Serverless have been validated; the other paths should be validated per workload, Region, IAM model, FSx for ONTAP-side authorization, and governance requirement.
Databricks Unity Catalog integration requires vendor confirmation for S3 Access Point ARN handling
Instance Profile + boto3 can be used for controlled PoC experiments, but it bypasses Unity Catalog governance and is classified as a legacy data access pattern by Databricks
Production Delta Lake workloads should continue to use supported object storage patterns
Any Databricks integration should be validated per workspace type, cluster mode, runtime version, IAM path, and governance requirement

Next Validation Metrics

Current blocker: Executor-scale validation requires a Customer-managed VPC workspace (same VPC as FSx for ONTAP). The Databricks-managed VPC workspace was tested with VPC Peering and Instance Profile (2026-05-24) — S3 AP access timed out due to managed VPC egress restrictions. A Customer-managed VPC workspace creation is pending Databricks support ticket resolution.

For executor-scale validation (not yet performed):

Object listing latency per executor
Total objects processed across cluster
Per-executor success/failure rate
Throughput per executor
Retry count and S3 API error rate
FSx for ONTAP throughput utilization during distributed access
Cost per processed GB

Driver-only boto3 success is not sufficient for Spark workloads. The next validation should run boto3 calls from executors using mapPartitions and compare credential, routing, latency, and error behavior across workers.

Executor-scale validation should not only test success/failure. It should capture per-executor latency, retry count, error code, and object count so that routing and concurrency behavior can be reviewed.

Benchmark run guidance:

Cold run: at least 1 (first access after cluster start, no metadata cache)
Warm metadata run: at least 1 (after initial listing populates metadata cache)
Repeated run: at least 3 (steady-state measurement)
Report: p50, p90, p95, p99 latency, plus average, min, max, and outliers
Include: object count, average object size, prefix depth, concurrent executor count
Include: FSx for ONTAP throughput utilization during test window
Note: S3 AP via FSx for ONTAP may exhibit metadata warm-up effects and prefix layout sensitivity. Cold vs warm differences should be documented explicitly.

Additional FSx for ONTAP metrics to capture:

FSx for ONTAP throughput utilization (% of provisioned capacity)
FSx for ONTAP CPU utilization
Network throughput (inbound/outbound)
S3 API request count by operation (List, Get, Head)
File count per prefix
Average object size
NFS/SMB latency during concurrent S3 API reads (contention indicator)

Expected output format (JSONL per executor):

{"executor_host": "ip-10-0-xx-yy", "partition_id": 3, "operation": "list_objects_v2", "status": "success", "latency_ms": 183, "objects_seen": 100, "error_code": null}

Adoption Success Metrics

For a controlled Databricks + FSx for ONTAP S3 AP PoC, define success criteria beyond technical pass/fail:

Baseline metrics (capture before validation):

Average search/access time (minutes) for target documents
Monthly document access count via current path
Current copy pipeline runtime (if applicable)
Current data freshness lag (hours)
Current support ticket count related to data access

PoC outcome metrics:

Number of target datasets evaluated
Number of successful read operations
Number of governance exceptions required
Time to first successful access
Number of support issues raised
Whether the customer selected Athena, Databricks, or another engine after validation
Decision outcome: proceed / adjust / stop
Time saved by early boundary identification (vs discovering in production)

Stop criteria:

No measurable business value after validation period
Governance exception required for production path with no compensating control available
Executor-scale validation fails with unacceptable error rate (define threshold before PoC)
FSx for ONTAP workload impact exceeds approved threshold (e.g., throughput utilization > 80%)
Vendor confirmation indicates unsupported path with no roadmap commitment
Security review rejects the access path without remediation option

Series Evaluation Criteria

Across this series, each engine is evaluated by:

Read-path compatibility
Write-path compatibility
Governance model
Operational impact
Performance evidence
Production readiness gap
Best-fit use case

Well-Architected Mapping

These criteria align with the AWS Well-Architected Data Analytics Lens:

Pillar	Evaluation focus in this series
Security	Governance model, IAM/AP policy, audit evidence, session policy behavior
Reliability	Failure modes, rollback path, support case evidence, DR considerations
Performance Efficiency	Throughput, executor-scale behavior, FSx for ONTAP utilization, latency
Cost Optimization	Engine-specific cost model, idle cost, cost per processed GB
Operational Excellence	Runbook, evidence template, support packet, monitoring

Business Value of Negative Validation

Negative validation is not failure. It is risk reduction.

A failed integration path can still create value when it prevents the wrong production architecture. This validation helps teams:

Avoid assuming S3 API compatibility equals platform governance compatibility
Choose the right engine for the right access pattern (Athena for read-only SQL, Databricks for lakehouse/ML)
Identify early when vendor confirmation is required before committing architecture
Reduce time spent on ambiguous troubleshooting by providing reproducible evidence
Prevent wasted PoC investment by documenting boundaries before production design
Enable informed conversations with vendors, partners, and security reviewers

For enterprise customers, early boundary identification can save weeks of engineering time and prevent costly architecture rework after production deployment.

What's Next

Series index:

Part 1: Athena — Query NAS Data In Place (validated read-oriented path, 9/9 negative tests pass)
Part 2: Databricks (this article) — session policy deep dive
Part 3: Snowflake — LIST Works, SELECT Doesn't (same session policy pattern)
Part 4: DuckDB Lambda — lightweight serverless analytics validation
Part 5: EMR Spark — read-write ETL pipeline (coming soon)
Part 6: Redshift Spectrum — DWH meets NAS data (coming soon)
Part 7: Trino — open-source SQL on NAS data (coming soon)

Open items:

Support cases: Waiting for Databricks response on session policy and NFS mount questions
FUSE NFS client: Investigating whether a user-space NFS client can bypass the runtime restriction

Caution on FUSE/user-space NFS: FUSE or user-space NFS clients should be treated as experimental only. They require separate validation for POSIX semantics, caching behavior, consistency, performance, failure recovery, and vendor supportability. Do not treat user-space NFS RPC success as a production workaround.

References

Related series by the same author (FSx for ONTAP S3 Access Points with other AWS services):

Building an Agentic Access-Aware RAG System with Amazon FSx for NetApp ONTAP, S3 Vectors, and S3 Access Points — Bedrock Knowledge Bases + permission-aware retrieval (GitHub)
FSx for ONTAP S3 Access Points as a Serverless Automation Boundary — AI Data Pipelines, Volume-Level SnapMirror DR, and Capacity Guardrails — Lambda, Bedrock, SageMaker, 17 industry use cases (GitHub)
Smart Routing, Transfer Family Ingestion, and Voice Chat — Permission-Aware RAG v4.2 — Transfer Family + SFTP ingestion for RAG pipeline

ONTAP S3 Multiprotocol vs FSx for ONTAP S3 Access Points:

ONTAP S3 multiprotocol (ONTAP 9.12.1+): S3 NAS bucket model on ONTAP SVM, enabling S3 clients to access NAS data directly on the ONTAP cluster
FSx for ONTAP S3 Access Points: AWS-managed S3 Access Point endpoint attached to FSx for ONTAP volume, integrating with AWS IAM, VPC, and S3-compatible services
Both expose NAS data via S3-style access, but the authorization path, service integration, and operational model differ. This article focuses on FSx for ONTAP S3 Access Points.

This article is part of the "FSx for ONTAP S3 Access Points × Lakehouse Deep Dive" series. All tests were performed on a real AWS environment with FSx for ONTAP (ONTAP 9.17.1, ap-northeast-1) and Databricks (DBR 17.3 LTS, Premium tier) in May 2026.

Scope reminder: This article documents observed behavior in one validated environment. It does not validate production readiness, distributed executor-scale processing, or all Databricks runtime versions. Terminology uses "observed in this environment" rather than "unsupported" or "incompatible" — platform behavior may change with future updates.

Future updates: If Databricks platform behavior changes or vendor confirmation becomes available, this article should be updated with the new validation result rather than treated as a permanent compatibility statement.

Disclaimer: This article is an independent validation report and does not represent Databricks, AWS, or NetApp official guidance. Product behavior, support status, and platform capabilities may change. Always validate in your own environment and consult vendor documentation and support channels.

TL;DR

How to Read This Article

Prerequisite Concepts

Key Concepts: Databricks Storage & Ingestion Architecture

Storage Credential → External Location → External Table/Volume

Auto Loader (Incremental Ingestion)

Concept Mapping: Snowflake ↔ Databricks

Data Ingestion Alternatives for FSx for ONTAP (When Auto Loader Is Blocked)

AI Readiness Score

The Goal

Test Environment

Approach 1: Unity Catalog External Location

The Setup

The Error

Observed Boundary

Proof

Status

Update (2026-05-24): access_point Field Resolves Session Policy

Approach 2: NFS Mount (Managed VPC)

The Idea

The Setup

The Result

Lesson

Approach 3: NFS Mount (Customer-managed VPC)

The Setup

Network Verification (All Pass)

sudo Access (Dedicated Mode)

NFS Client Installation and Export Verification

The Mount Attempt

The Investigation: Why NFS Mount Fails

Step 1: Verify ONTAP Export Policy

ONTAP Production Hardening Checklist

Step 2: strace the mount command

Step 3: Manual NFS RPC Calls (User-space)

Step 4: tmpfs Mount Test

Step 5: Seccomp Status

The Conclusion

All Mount Options Tested

Evidence Matrix

FSx for ONTAP S3 AP Authorization Path

Approach 4: Instance Profile + boto3

The Setup

IMDS Access

Regular S3 Access

FSx for ONTAP S3 AP Access

Approach 5: S3 AP + Instance Profile (Managed VPC with VPC Peering)

The Hypothesis

The Setup

The Result

Analysis

Lesson

IMDS Access Matrix

Complete Results Summary

Governance Impact Summary

MLOps Boundary

AI / RAG Data Readiness Checklist

Decision Matrix

Architecture Decision Guidance

FSx for ONTAP Sizing Considerations

Compensating Controls for Controlled boto3 PoC

Data Protection Considerations

Discovery Questions for Partners

Troubleshooting Playbook

Known Failure Signatures

What This Article Does Not Conclude

What to Tell Stakeholders

Lessons Learned

1. "S3-compatible" ≠ "works everywhere S3 works"

2. Error messages can be misleading

3. Platform security boundaries are not always documented

4. Customer-managed VPC is essential for storage integration

5. Instance Profile is a pragmatic PoC workaround

6. Always isolate variables when troubleshooting

7. Negative validation creates value

Databricks Support Case Packet

Use Case Fit Matrix

Cost Model Considerations

Partner / Customer Conversation Guide

Next Validation Metrics

Adoption Success Metrics

Update (2026-05-24): `access_point` Field Resolves Session Policy