DEV Community

Cover image for Databricks and FSx for ONTAP S3 Access Points — A Layer-by-Layer Validation of Observed Boundaries

Databricks and FSx for ONTAP S3 Access Points — A Layer-by-Layer Validation of Observed Boundaries

TL;DR

Connecting Databricks to FSx for ONTAP S3 Access Points is significantly harder than Athena (Part 1). After testing every approach I could find — Unity Catalog External Locations, NFS mounts, Instance Profiles, multiple VPC configurations — here is what I found:

  • Unity Catalog's session policy initially blocked the FSx for ONTAP S3 AP ARN pattern → 403
  • Setting the access_point field on the External Location partially resolves the session policy: explicit-path file read succeeds, but UC table creation, subdirectory listing, and write operations remain blocked — meaning UC governance features (lineage, tags, fine-grained access) cannot yet be applied
  • NFS kernel mount is blocked by seccomp by design (confirmed by Databricks Support)
  • Instance Profile + boto3 works for direct S3 AP access (bypassing Unity Catalog)
  • Spark read with explicit file path works under UC governance (sensor CSV: 1000 rows read successfully)

This article is a layer-by-layer validation of observed integration boundaries between Databricks and FSx for ONTAP S3 Access Points. It is not an argument against Databricks. Databricks remains a strong platform for lakehouse, ML, and production Delta workloads. This article focuses narrowly on one integration boundary: direct access from Databricks to FSx for ONTAP S3 Access Points.

This article documents the full troubleshooting journey, including the strace analysis that identified the root cause of NFS mount failures.

This article documents observed behavior in one validated environment. It should not be interpreted as a general compatibility statement for all Databricks configurations or future platform versions.

GitHub Repository: fsxn-lakehouse-integrations

If you want to reproduce this validation, the repository's integrations/databricks/ directory contains environment setup notes, and verification-pack/ contains test templates and evidence recording formats. The verification pack is intentionally template-first by design, so validation runs can produce consistent, reviewable evidence across environments. Actual result files will be added as validation runs are completed.


How to Read This Article

This article is:

  • A reproduction-focused validation report
  • Evidence from one environment (DBR 17.3 LTS, ap-northeast-1)
  • A starting point for vendor confirmation and architecture discussion

This article is not:

  • A general compatibility statement
  • A production certification
  • A statement on behalf of Databricks

Read by role:

  • Databricks admin: Unity Catalog External Location → Governance Impact Summary
  • Storage engineer: NFS Mount investigation → Evidence Matrix
  • Data engineer: Instance Profile + boto3 → Next Validation Metrics
  • Partner / SA: Decision Matrix → Discovery Questions → Partner Conversation Guide
  • Opening a support case: Databricks Support Case Packet

Prerequisite Concepts

Before reading this article, it helps to understand:

  • Unity Catalog Storage Credential — an object that stores a reference to a cloud IAM role for accessing external storage
  • Unity Catalog External Location — maps a cloud storage path to a storage credential for governed access
  • Instance Profile on AWS — an IAM role attached to an EC2 instance, providing credentials via IMDS
  • Databricks-managed VPC vs Customer-managed VPC — whether Databricks or the customer controls the workspace network
  • Cluster access modes — Standard (shared, multi-user with UC governance) and Dedicated (single-user with sudo access). Unity Catalog requires standard or dedicated access mode
  • S3 Access Point ARN vs S3 bucket ARN — S3 AP uses arn:aws:s3:<region>:<account>:accesspoint/<name>, not arn:aws:s3:::<bucket>
  • Driver vs executor behavior in Spark — the driver orchestrates; executors run distributed tasks. Credentials and network paths may differ between them

The Goal

Process unstructured data (images, documents, audio) stored on FSx for ONTAP from Databricks — without copying data to S3. FSx for ONTAP S3 Access Points should make this possible by exposing NFS/SMB file data via S3 API.

In Part 1, Athena worked cleanly in my validation using the official AWS tutorial pattern. Databricks, however, has multiple security layers that interact with S3 AP in unexpected ways.


Test Environment

I tested across two workspace configurations:

Runtime scope: Only DBR 17.3 LTS (Spark 4.0.0) was tested. This article does not compare DBR 16.x, 18.x, ML runtimes, GPU runtimes, or serverless compute. Runtime-level behavior may differ across versions and compute types. This article does not compare behavior across DBR versions or access modes beyond those listed in the test environment.

┌─────────────────────────────────────────────────────────────────────┐
│ Workspace 1: Databricks-managed VPC                                 │
│ - VPC created and managed by Databricks                             │
│ - Limited network control                                           │
│ - VPC Peering to FSx for ONTAP VPC                                  │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ Workspace 2: Customer-managed VPC (same VPC as FSx for ONTAP)       │
│ - Full network control                                              │
│ - Direct connectivity to FSx for ONTAP (no peering needed)          │
│ - NAT Gateway for Databricks control plane                          │
└─────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Cluster modes tested:

  • Standard (Shared Access)
  • Dedicated (Single User) — provides sudo/root access
  • Dedicated with Instance Profile

All tests used DBR 17.3 LTS (Spark 4.0.0), ap-northeast-1.


Approach 1: Unity Catalog External Location

The Setup

The Databricks-governed path for S3 data access is to create a Storage Credential and External Location. I tested whether the same pattern could work with an FSx for ONTAP S3 Access Point.

# What I expected to work
files = dbutils.fs.ls("s3://<FSx-S3-AP-alias>/")
Enter fullscreen mode Exit fullscreen mode

The Error

AccessDenied: User: arn:aws:sts::<ACCOUNT>:assumed-role/databricks-...-cross-account-role/
  databricks-unity-catalog-credential-<WORKSPACE_ID>
is not authorized to perform: s3:ListBucket on resource:
  "arn:aws:s3:<REGION>:<ACCOUNT>:accesspoint/<AP_NAME>"
because no session policy allows the s3:ListBucket action
Enter fullscreen mode Exit fullscreen mode

Observed Boundary

Unity Catalog applies a session policy when it calls AssumeRole. This session policy acts as a permissions boundary — even if the IAM role has s3:* on *, the session policy restricts what the assumed session can do.

The evidence narrows the failure domain, but does not identify Databricks internal implementation details.

In this validation, the generated session policy behavior allowed access to a standard S3 bucket path but did not allow the FSx for ONTAP S3 Access Point ARN pattern:

arn:aws:s3:::bucket-name       ✅ Allowed
arn:aws:s3:::bucket-name/*     ✅ Allowed
Enter fullscreen mode Exit fullscreen mode

But FSx for ONTAP S3 AP uses a different ARN format:

arn:aws:s3:<region>:<account>:accesspoint/<name>    ❌ Not in session policy
Enter fullscreen mode Exit fullscreen mode

Proof

The same IAM role works fine for regular S3 buckets through Unity Catalog:

# This works — regular S3 bucket
dbutils.fs.ls("s3://my-workspace-storage-bucket/")
# SUCCESS

# This fails — FSx for ONTAP S3 Access Point
dbutils.fs.ls("s3://<FSx-S3-AP-alias>/")
# AccessDenied: no session policy allows...
Enter fullscreen mode Exit fullscreen mode

Status

In my initial validation, this behaved like a platform boundary in Unity Catalog's generated session policy. I opened a support case to confirm whether S3 Access Point ARN patterns can be supported for external locations.

Before (access_point field not set) — Unity Catalog session policy blocks all S3 AP operations:

Session policy error before access_point field — UNAUTHORIZED_ACCESS on dbutils.fs.ls

Without the access_point field, dbutils.fs.ls on the S3 AP alias returns UNAUTHORIZED_ACCESS. The session policy only allows standard S3 bucket ARNs.

Update (2026-05-24): access_point Field Resolves Session Policy

Databricks Support (Case #00921422) confirmed that Unity Catalog External Locations support an access_point field. Setting this field includes the S3 AP ARN in the generated session policy.

Configuration that works:

External Location:
  URL: s3://<FSx-S3-AP-alias>/
  Credential: <storage-credential-name>
  access_point: arn:aws:s3:<region>:<account>:accesspoint/<ap-name>
Enter fullscreen mode Exit fullscreen mode

API call to set the field:

curl -X PATCH \
  https://<workspace>/api/2.1/unity-catalog/external-locations/<location-name> \
  -H "Authorization: Bearer <token>" \
  -d '{"access_point": "arn:aws:s3:<region>:<account>:accesspoint/<ap-name>"}'
Enter fullscreen mode Exit fullscreen mode

What now works under UC governance:

Operation Result Notes
dbutils.fs.ls("s3://<alias>/") Top-level listing (287 items)
dbutils.fs.head("s3://<alias>/file.txt") Read file content
spark.read.text("s3://<alias>/file.txt") Spark read with explicit file path
spark.read.csv("s3://<alias>/path/to/file.csv") 1000 rows, schema inferred

After (access_point field set) — Top-level listing succeeds, 287 items visible:

dbutils.fs.ls success — 287 items listed from FSx for ONTAP S3 AP
With the access_point field configured, dbutils.fs.ls at the top level returns 287 items from the FSx for ONTAP volume.

Sensor data read via Spark — 1000 rows with schema inference:

Spark DataFrame reading sensor CSV from FSx for ONTAP S3 AP — 1000 rows
spark.read.csv with explicit file path successfully reads 1000 sensor readings with full schema inference (timestamp, machine_id, temperature_c, vibration_mm_s, pressure_bar, rpm, status, location).

What still does NOT work:

Operation Result Error
dbutils.fs.ls("s3://<alias>/subdir/") AccessDenied on getFileStatus
spark.read.load("s3://<alias>/subdir/") Forbidden (directory-level access)
CREATE TABLE LOCATION 's3://<alias>/...' UC_CLOUD_STORAGE_ACCESS_FAILURE
dbutils.fs.cp (PutObject) AccessDenied

Remaining blockers — Subdirectory listing and UC table creation fail:

Subdirectory ls blocked and CREATE TABLE fails — UC governance cannot be applied
Subdirectory dbutils.fs.ls returns UNAUTHORIZED_ACCESS. CREATE TABLE LOCATION fails with UC_CLOUD_STORAGE_ACCESS_FAILURE. Without a UC table, governance features (lineage, tags, fine-grained access control) cannot be applied.

Summary of what works and what doesn't — governance impact
Summary: Data is readable but not governable. The critical blocker is CREATE TABLE LOCATION failure, which prevents Unity Catalog governance from being applied to the data.

Key pattern: File-level read operations succeed (GetObject with explicit key). Directory-level operations (ListObjectsV2 with prefix, HeadObject on prefix) fail for subdirectories. This suggests the session policy scopes ListObjectsV2 to the root prefix only.

Implication: Explicit-path file read works, but without UC table creation, Unity Catalog governance features — lineage, fine-grained access control, governance tags, column masking, row filtering — cannot be applied. The data is technically readable through the External Location path but not registerable as a governed UC table. This limits the practical value for production governance use cases until the subdirectory listing and table creation issues are resolved.

Requirements for this path:

  • Customer-managed VPC workspace (same VPC as FSx for ONTAP)
  • External Location with access_point field set
  • Storage Credential IAM role with S3 AP permissions
  • NAT Gateway for control plane connectivity

Approach 2: NFS Mount (Managed VPC)

The Idea

If S3 AP doesn't work through Unity Catalog, mount the FSx for ONTAP volume directly via NFS.

The Setup

Created VPC Peering between Databricks-managed VPC and FSx for ONTAP VPC. Updated route tables and security groups.

The Result

%sh
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/2049' && echo "REACHABLE" || echo "NOT REACHABLE"
# NOT REACHABLE
Enter fullscreen mode Exit fullscreen mode

NFS port (TCP 2049) is unreachable from Databricks-managed VPC, even with VPC Peering configured. From the customer-controlled routing perspective, route tables and FSx for ONTAP-side security groups were configured to allow NFS. However, cluster-side egress remained governed by the Databricks-managed environment, and NFS egress was not permitted.

Lesson

Databricks-managed VPC gives you limited network control. The egress rules on cluster instances are managed by Databricks, not by customer-added security group rules.


Approach 3: NFS Mount (Customer-managed VPC)

The Setup

Deployed a new workspace in the same VPC as FSx for ONTAP. No peering needed — direct L3 connectivity.

Network Verification (All Pass)

%sh
echo "TCP 2049 (NFS):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/2049' && echo "REACHABLE"
echo "TCP 111 (portmapper):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/111' && echo "REACHABLE"
echo "TCP 635 (mountd):"
timeout 3 bash -c 'echo > /dev/tcp/10.0.3.133/635' && echo "REACHABLE"
Enter fullscreen mode Exit fullscreen mode
TCP 2049 (NFS): REACHABLE ✅
TCP 111 (portmapper): REACHABLE ✅
TCP 635 (mountd): REACHABLE ✅
Enter fullscreen mode Exit fullscreen mode

Note: The /dev/tcp test confirms TCP reachability. NFSv3 mountd may use TCP or UDP depending on configuration. The exact transport should be validated with rpcinfo if needed.

sudo Access (Dedicated Mode)

%sh
sudo whoami
# root ✅
Enter fullscreen mode Exit fullscreen mode

NFS Client Installation and Export Verification

%sh
sudo apt-get install -y nfs-common
showmount -e 10.0.3.133
Enter fullscreen mode Exit fullscreen mode
Export list for 10.0.3.133:
/vol1 (everyone) ✅
Enter fullscreen mode Exit fullscreen mode

Everything looks perfect. Network connected, root access available, NFS exports visible. Let's mount:

The Mount Attempt

%sh
sudo mkdir -p /mnt/fsxn
sudo mount -t nfs -o nfsvers=3,nolock 10.0.3.133:/vol1 /mnt/fsxn
Enter fullscreen mode Exit fullscreen mode
mount.nfs: access denied by server while mounting 10.0.3.133:/vol1
Enter fullscreen mode Exit fullscreen mode

Wait, what? The server is showing the export to everyone, we have root access, the network is connected... why "access denied by server"?


The Investigation: Why NFS Mount Fails

This is where it gets interesting. The error message says "access denied by server" — but is it really the server?

Step 1: Verify ONTAP Export Policy

Via ONTAP REST API (accessible from the same cluster):

{
  "rules": [{
    "clients": [{"match": "0.0.0.0/0"}],
    "ro_rule": ["any"],
    "rw_rule": ["any"],
    "superuser": ["any"],
    "protocols": ["any"]
  }]
}
Enter fullscreen mode Exit fullscreen mode

The export policy is maximally permissive — all clients, all protocols, read-write, superuser. ONTAP is not denying access.

Important: This permissive export policy was used only to eliminate ONTAP export restrictions as a variable during troubleshooting. It is not a production recommendation. For production, restrict: client CIDR, protocol, read/write rule, superuser mapping, and volume/junction path scope.

ONTAP Production Hardening Checklist

For production deployments, harden the ONTAP configuration:

  • [ ] Restrict export policy client CIDR to known analytics subnets only
  • [ ] Avoid rw=any and superuser=any — use explicit security flavors
  • [ ] Map S3 Access Point file system user to a least-privilege NAS user (not root/UID 0)
  • [ ] Validate NFS/SMB ACL behavior when S3 AP is active
  • [ ] Validate S3 API access against file-level permissions
  • [ ] Capture ONTAP audit evidence where required (ONTAP FPolicy)
  • [ ] Document junction path and volume scope
  • [ ] Isolate analytics volumes from production NFS/SMB workloads if throughput contention is a concern

Step 2: strace the mount command

%sh
sudo strace -f -e trace=mount mount -t nfs -o nfsvers=3,nolock 10.0.3.133:/vol1 /mnt/fsxn 2>&1
Enter fullscreen mode Exit fullscreen mode
mount.nfs: trying 10.0.3.133 prog 100003 vers 3 prot TCP port 2049
mount.nfs: trying 10.0.3.133 prog 100005 vers 3 prot UDP port 635
mount("10.0.3.133:/vol1", "/mnt/fsxn", "nfs", ...) = -1 EACCES (Permission denied)
mount.nfs: mount(2): Permission denied
Enter fullscreen mode Exit fullscreen mode

Key finding: mount.nfs successfully connects to both NFS (port 2049) and mountd (port 635), but the mount() syscall returns EACCES. The denial happens at the kernel level, not at the server.

TCP/UDP note: The initial reachability check used /dev/tcp, confirming TCP reachability. During the actual mount attempt, mount.nfs tried mountd over UDP as shown in the strace output. This is not a contradiction — NFSv3 mountd may use either transport. For production troubleshooting, use rpcinfo and packet capture to confirm the actual protocol and port mapping.

Step 3: Manual NFS RPC Calls (User-space)

To prove ONTAP is granting access, I performed manual NFS RPC calls using Python sockets:

import socket, struct

# MOUNT RPC (program 100005, version 3, procedure MNT)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(5)
sock.sendto(mount_rpc_packet, ("10.0.3.133", 635))
response = sock.recv(4096)
# Parse: status=0 (MNT3_OK), file_handle=44 bytes
print("MOUNT RPC: SUCCESS ✅")

# NFS3 FSINFO, GETATTR, READDIRPLUS — all succeed
print("NFS3 FSINFO: SUCCESS ✅")
print("NFS3 GETATTR: SUCCESS ✅")
print("NFS3 READDIRPLUS: SUCCESS ✅")
Enter fullscreen mode Exit fullscreen mode

All NFS operations succeed at user-space level. ONTAP grants full access. The problem is not the server.

Step 4: tmpfs Mount Test

%sh
sudo mount -t tmpfs tmpfs /tmp/test_mount && echo "SUCCESS" || echo "FAILED"
Enter fullscreen mode Exit fullscreen mode
SUCCESS ✅
Enter fullscreen mode Exit fullscreen mode

The mount() syscall itself is allowed. Only NFS filesystem type is blocked.

Step 5: Seccomp Status

%sh
cat /proc/self/status | grep Seccomp
Enter fullscreen mode Exit fullscreen mode
Seccomp:        2
Seccomp_filters:        1
Enter fullscreen mode Exit fullscreen mode

Seccomp: 2 = BPF filter mode active.

The Conclusion

┌─────────────────────────────────────────────────────────────────┐
│ Evidence Chain:                                                 │
│                                                                 │
│ 1. Network connectivity      → ✅ All NFS ports reachable       │
│ 2. ONTAP export policy       → ✅ 0.0.0.0/0, rw=any, su=any     │
│ 3. NFS RPC (user-space)      → ✅ All operations succeed        │
│ 4. mount() with type="nfs"   → ❌ EACCES                        │
│ 5. mount() with type="tmpfs" → ✅ Success                       │
│ 6. Seccomp                   → Active (BPF filter mode)         │
│                                                                 │
│ Conclusion: The evidence points to a local platform security    │
│ boundary, likely seccomp filtering or an equivalent runtime     │
│ restriction, blocking the NFS mount path.                       │
└─────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The error message "access denied by server" is misleading. The mount.nfs program interprets the kernel's EACCES as a server-side denial, but strace reveals the truth: the denial is local.

If sharing this finding: This is not a Databricks compatibility verdict. It is a layer-by-layer validation of observed boundaries in one environment (DBR 17.3 LTS, ap-northeast-1). Platform behavior may differ across runtime versions, access modes, and configurations.

Important: Because Databricks does not publicly document this specific syscall/filesystem-type behavior, treat this as validation evidence rather than an official platform statement until confirmed by Databricks Support.

All Mount Options Tested

Options Result
-o nfsvers=3,nolock access denied
-o nfsvers=4.1 access denied
-o nfsvers=3,nolock,resvport access denied
-o nfsvers=3,nolock,noresvport access denied
-o sec=sys access denied
(no options) access denied
tmpfs SUCCESS

Evidence Matrix

Layer Evidence Result Interpretation
Network TCP 2049 / TCP 111 / TCP 635 reachable ✅ Pass Network path exists between cluster and FSx for ONTAP
ONTAP export Export policy allows 0.0.0.0/0, rw=any, su=any ✅ Pass Export policy is not the blocker
NFS server RPC MOUNT / FSINFO / GETATTR / READDIRPLUS succeed via user-space ✅ Pass ONTAP grants NFS operations to this client
Local syscall mount(type=nfs) returns EACCES ❌ Fail Evidence points to a local runtime boundary affecting kernel NFS mount
Local syscall control mount(type=tmpfs) succeeds ✅ Pass mount() syscall is not universally blocked
Runtime security Seccomp mode 2 observed in the tested process context Observed Runtime filtering may restrict NFS-specific mount
Unity Catalog S3 External Location test on S3 AP ARN → AccessDenied ❌ Fail Session policy does not allow S3 AP ARN pattern
Instance Profile S3 boto3 GetObject on S3 AP → Success ✅ Pass IAM role itself has correct permissions

showmount -e confirms that the export is visible through mountd. It does not guarantee that the local runtime allows the kernel NFS mount operation to complete. showmount -e validates NFS export visibility only. It does not validate the file system user identity associated with the S3 Access Point. For S3 AP authorization, record the associated UNIX or Windows identity and verify file-level permissions separately — these are independent authorization paths.


FSx for ONTAP S3 AP Authorization Path

FSx for ONTAP S3 Access Points use a dual-layer authorization model that combines AWS IAM permissions with file system-level permissions:

Layer 1 — S3-side authorization:

  • IAM identity-based policy (caller's permissions)
  • S3 Access Point resource policy
  • VPC endpoint policy (if applicable)
  • SCP / RCP (if applicable)

Layer 2 — FSx for ONTAP-side authorization:

  • File system user associated with the access point
  • UNIX mode-bits / NFSv4 ACLs (for UNIX security style volumes)
  • Windows ACLs (for NTFS security style volumes)

In the Databricks validation, the failure occurs before Layer 2 — Unity Catalog's generated session policy restricts the assumed role session at the S3 API level, preventing the request from reaching FSx for ONTAP-side authorization. The Instance Profile + boto3 path bypasses Unity Catalog's session policy, allowing both layers to be evaluated normally.

For production, both layers must be configured with least-privilege. A permissive file system user (e.g., root / UID 0) combined with a broad IAM policy creates an overly permissive access path.


Approach 4: Instance Profile + boto3

The Setup

Customer-managed VPC workspace, Dedicated cluster with an Instance Profile attached.

IMDS Access

import urllib.request, json

# IMDSv2 token
req = urllib.request.Request(
    "http://169.254.169.254/latest/api/token",
    headers={"X-aws-ec2-metadata-token-ttl-seconds": "21600"},
    method="PUT"
)
token = urllib.request.urlopen(req, timeout=2).read().decode()
print(f"Token: {token[:20]}...")  # ✅ Success
Enter fullscreen mode Exit fullscreen mode

Regular S3 Access

import boto3
s3 = boto3.client("s3", region_name="ap-northeast-1")
buckets = s3.list_buckets()
print(f"ListBuckets: {len(buckets['Buckets'])} buckets")  # ✅ 58 buckets
Enter fullscreen mode Exit fullscreen mode

FSx for ONTAP S3 AP Access

response = s3.list_objects_v2(
    Bucket="<FSx-S3-AP-alias>",
    MaxKeys=10
)
print(f"Objects: {response['KeyCount']}")  # ✅ Works
Enter fullscreen mode Exit fullscreen mode

This works. Instance Profile credentials bypass Unity Catalog's session policy entirely. boto3 talks directly to the S3 API with the EC2 instance's IAM role.

Governance warning
Instance Profile + boto3 is a pragmatic workaround for PoC and controlled experiments. It bypasses Unity Catalog governance, including fine-grained access control, lineage, and centralized data access auditing. Do not treat this as a production lakehouse governance pattern without a separate security and compliance review. Databricks recommends Unity Catalog external locations as the standard governed access mechanism.

Scope note
The Instance Profile + boto3 sample above runs on the driver node only (single-node PoC pattern). Whether the same credential, network path, and concurrency behavior applies to Spark executors in a multi-node cluster requires separate validation.


Approach 5: S3 AP + Instance Profile (Managed VPC with VPC Peering)

The Hypothesis

If Instance Profile + boto3 works on a Customer-managed VPC (Approach 4), does it also work from a Databricks-managed VPC with VPC Peering to the FSx for ONTAP VPC? This would validate whether the S3 Gateway Endpoint in the Databricks-managed VPC can route S3 AP requests to the FSx for ONTAP backend.

The Setup

  • Databricks-managed VPC (vpc-<databricks-managed>, CIDR: 10.53.0.0/16)
  • FSx for ONTAP VPC (vpc-<fsxn-vpc>, CIDR: 10.0.0.0/16)
  • VPC Peering: pcx-<peering-id> (active)
  • Route tables: updated in both directions
  • FSx for ONTAP security group: allows all traffic (0.0.0.0/0)
  • S3 Gateway Endpoint: vpce-<s3-gateway> (full access policy)
  • Cluster: m5.large × 3, DBR 17.3 LTS, Dedicated mode, Instance Profile attached

The Result

{
  "dns_resolution": {"success": true, "ip": "52.219.151.110"},
  "vpc_peering_443": {"success": false, "result_code": 11},
  "vpc_peering_nfs": {"success": false, "result_code": 11},
  "s3_ap_access": {"success": false, "error": "Read timeout"},
  "imds": {"success": true}
}
Enter fullscreen mode Exit fullscreen mode

Analysis

Layer Result Interpretation
DNS resolution S3 AP alias resolves to S3 endpoint IP (52.219.x.x)
VPC Peering (TCP 443) FSx for ONTAP management IP unreachable — egress blocked
VPC Peering (NFS 2049) NFS port unreachable — egress blocked
S3 AP via S3 Gateway Endpoint Read timeout — S3 service reachable but FSx for ONTAP backend connection fails
IMDS / Instance Profile Credentials available and valid

Key finding: Even with VPC Peering established, routes configured, and security groups permissive, the Databricks-managed VPC's egress restrictions block connectivity to the FSx for ONTAP backend. The S3 Gateway Endpoint routes requests to the S3 service, but FSx for ONTAP S3 AP requires the S3 service to reach the FSx for ONTAP file system — which is in a different VPC from the Databricks cluster. The S3 service-side routing to the FSx for ONTAP backend is not affected by customer-side VPC Peering.

Important: This result confirms that FSx for ONTAP S3 AP access requires the requesting service (Databricks cluster) to be in the same VPC as the FSx for ONTAP file system, or to use a network configuration where the S3 service can reach the FSx for ONTAP backend. VPC Peering between the requester VPC and the FSx for ONTAP VPC does not help because S3 AP requests are routed through the S3 service, not directly to the FSx for ONTAP IP.

Lesson

S3 AP requests do not traverse VPC Peering. They are routed through the S3 service endpoint. For FSx for ONTAP S3 AP to work, the S3 service must be able to reach the FSx for ONTAP file system's internal endpoint. This is handled by AWS internally when the request originates from the same region, but the Databricks-managed VPC's egress restrictions appear to interfere with this path.

Customer-managed VPC (same VPC as FSx for ONTAP) remains the only validated path for Instance Profile + boto3 access to FSx for ONTAP S3 AP from Databricks.


IMDS Access Matrix

Cluster Mode Workspace Type IMDS boto3 S3 boto3 S3 AP
Standard (Shared) Managed VPC
Dedicated Managed VPC
Dedicated Customer VPC
Dedicated + Instance Profile Managed VPC (VPC Peering) ⚠️
Dedicated + Instance Profile Customer VPC

Row 4 note: IMDS works and Instance Profile credentials are valid, but S3 AP access times out because the Databricks-managed VPC egress restrictions block FSx for ONTAP backend connectivity. Regular S3 bucket access was not tested with a permissive policy (AccessDenied was due to intentionally scoped IAM policy, not network).

IMDS is blocked on all configurations except Dedicated mode with an explicitly registered Instance Profile on a Customer-managed VPC workspace.


Complete Results Summary

# Approach Result Blocker
1 UC External Location + dbutils.fs (without access_point field) Generated session policy did not allow S3 AP ARN
1b UC External Location + access_point field (file-level read) Top-level ls, head, spark.read with explicit path all work
1c UC External Location + access_point field (subdirectory ls) Prefix-based ListObjectsV2 still blocked for subdirectories
1d UC External Location + CREATE TABLE LOCATION UC_CLOUD_STORAGE_ACCESS_FAILURE during internal validation
2 UC External Location + Spark read (directory) Same prefix-level access issue
3 NFS mount (Managed VPC, VPC Peering) Egress blocked (port 2049)
4 NFS mount (Customer VPC, Dedicated) NFS mount blocked by seccomp by design (confirmed by Databricks Support)
5 boto3 (Managed VPC, no Instance Profile) IMDS blocked
6 boto3 (Customer VPC, no Instance Profile) IMDS blocked
7 Instance Profile + boto3 (Customer VPC) Works (bypasses UC governance)
8 NFS RPC user-space (Customer VPC) Works but impractical for production
9 No Isolation Shared mode Legacy access mode; not pursued
10 S3 AP + Instance Profile + boto3 (Managed VPC, VPC Peering) Managed VPC egress blocks FSx for ONTAP backend connectivity

Governance Impact Summary

Access path Governance model Auditability Production suitability
Unity Catalog External Location Centralized UC governance (fine-grained, lineage) High (if supported) Preferred, but blocked in this validation
Instance Profile + boto3 EC2 IAM role based AWS-side logs possible if enabled; UC lineage not captured PoC only unless separately approved
Kernel NFS mount Filesystem / OS level Outside UC governance Not viable in this validation
User-space NFS RPC Custom application path Custom logging required Experimental only
Athena + FSx for ONTAP S3 AP IAM / S3 AP / Athena workgroup AWS-side evidence possible Best current read-only SQL analytics fit
Bedrock Knowledge Bases + FSx for ONTAP S3 AP IAM / S3 AP / Bedrock Knowledge Base role / guardrails where used AWS-side evidence possible AWS-documented RAG / GenAI path; validated with permission-aware retrieval in related series
Glue / EMR Serverless + FSx for ONTAP S3 AP IAM / S3 AP / Glue / EMR job roles AWS-side evidence possible Validated ETL / Spark path in this broader series where verification-pack evidence is available; validate production write-back semantics separately

AWS-side audit events, such as CloudTrail data events where enabled and applicable, may show S3 API access by the instance profile, but they do not replace Unity Catalog lineage, table-level privileges, or centralized Databricks governance controls.

MLOps Boundary

Using boto3 to read objects from FSx for ONTAP S3 AP does not automatically make the downstream ML workflow governed.

If the data retrieved via Instance Profile + boto3 is used for ML or GenAI:

  • Register derived datasets in governed storage (Unity Catalog managed location)
  • Track experiments with MLflow
  • Register models in Unity Catalog where applicable
  • Document source data access path (S3 AP alias, prefix, timestamp)
  • Record whether training data lineage is captured or externalized
  • Ensure the ML compute uses an access mode compatible with Unity Catalog governance

Models in Unity Catalog provides centralized access control, auditing, lineage, and model discovery across workspaces. If the PoC data path bypasses UC, the model lifecycle should still be governed through UC model registry.

AI / RAG Data Readiness Checklist

If the FSx for ONTAP S3 AP data is intended for AI, RAG, or GenAI pipelines:

  • [ ] Are documents classified by sensitivity (PHI, PII, financial, internal, public)?
  • [ ] Are file-level permissions preserved or re-modeled for the AI pipeline?
  • [ ] Is metadata available for filtering and retrieval (file type, date, owner)?
  • [ ] Is freshness requirement defined (real-time, daily, weekly)?
  • [ ] Is read-only access sufficient, or does the pipeline need write-back?
  • [ ] Is human review required for generated output before downstream use?
  • [ ] Is permission-aware retrieval required (user A sees only their authorized documents)?

If permission-aware retrieval is required, define one of:

  • Enforce at source access path — use per-user or per-group S3 Access Points with scoped file system users
  • Re-model permissions in metadata index — extract file-level ACLs into a searchable metadata store and filter at query time
  • Filter retrieval results by user/group claims — apply post-retrieval filtering based on authenticated user identity
  • Do not proceed until authorization model is validated and approved by security owner

Instance Profile + boto3 approval requirements (for regulated workloads):

  • Data owner approval
  • Security owner approval
  • Platform owner approval
  • Compliance reviewer approval (if regulated data involved)
  • Defined: allowed prefix, allowed operations, logging requirements, expiration date
  • Approval record location (where the decision is stored)
  • Review / expiration date (when the approval must be re-evaluated)
  • Incident escalation contact

For regulated workloads, do not use Instance Profile + boto3 for:

  • Patient-facing responses or clinical decision support
  • Financial decision automation
  • Unreviewed access to regulated datasets
  • Writeback to source-controlled data locations
  • Workloads requiring Unity Catalog lineage

Decision Matrix

Requirement Recommended path today Notes Next validation action
SQL query on structured files Athena + FSx for ONTAP S3 AP (Part 1) Verified, simple, governed Scale test with production data sizes
RAG / GenAI over NAS documents Bedrock Knowledge Bases + FSx for ONTAP S3 AP AWS-documented tutorial Validate retrieval accuracy, permission-aware filtering, and sync freshness
ETL pipeline on NAS data Glue or EMR Serverless + FSx for ONTAP S3 AP Validated in this broader series where verification-pack evidence is available Validate throughput impact and production write-back semantics
Serverless file processing Lambda + FSx for ONTAP S3 AP AWS-documented tutorial Validate concurrency and throughput for your workload
Databricks governance with Unity Catalog Wait for platform support UC session policy currently blocks S3 AP ARN in my validation Monitor Databricks support case response
Databricks unstructured data PoC Dedicated cluster + Instance Profile + boto3 Works, but bypasses UC governance Validate executor-scale behavior separately
Production Databricks lakehouse tables Use supported cloud storage (S3 bucket) Required for Delta write semantics N/A — use standard pattern
Databricks distributed processing over FSx for ONTAP S3 AP Not validated yet Driver-only boto3 success does not prove executor-scale behavior Test with multi-node cluster and Spark mapPartitions
Enterprise read-only analytics Athena / Glue / EMR Serverless / FSx for ONTAP S3 AP Best current fit for AWS-native path Production workload isolation test
Video streaming from NAS CloudFront + FSx for ONTAP S3 AP AWS-documented tutorial Validate caching and latency for your content

This article does not recommend bypassing Unity Catalog for production governed lakehouse workloads. The Instance Profile + boto3 path is documented because it worked in a controlled validation environment, not because it is the preferred governance model.


Architecture Decision Guidance

Databricks remains the recommended platform for curated lakehouse workloads, governed Delta tables, ML pipelines, and multi-step data engineering. FSx for ONTAP S3 AP should be treated as a source integration boundary that may require staging, validation, or an alternate read path depending on governance requirements.

Use Databricks when:

  • Data is already in supported object storage (S3 bucket)
  • Delta Lake write semantics are required (INSERT, MERGE, OPTIMIZE, VACUUM)
  • Unity Catalog lineage and fine-grained governance are mandatory
  • Large-scale Spark processing is required
  • ML/AI workloads need integrated compute

Use AWS-native services + FSx for ONTAP S3 AP when:

  • The primary requirement is read-only SQL analytics over NAS data → Athena (validated in Part 1)
  • RAG / GenAI over enterprise documents → Bedrock Knowledge Bases (AWS-documented path)
  • ETL pipelines reading/transforming NAS data → Glue (validated in this broader series where verification-pack evidence is available)
  • Spark-scale processing without persistent clusters → EMR Serverless (validated in this broader series where verification-pack evidence is available)
  • Serverless file processing (thumbnails, text extraction, transcription) → Lambda (AWS-documented path)
  • Video streaming from NAS → CloudFront (AWS-documented path)
  • External partner file exchange → Transfer Family (AWS-documented path)
  • BI and AI-assisted analytics → QuickSight candidate path, typically via Athena or Glue Catalog
  • Source data copy should be minimized
  • Workload isolation and governance can be validated with AWS-side controls
  • Serverless, pay-per-query or pay-per-invocation cost model is preferred

Use controlled boto3 PoC only when:

  • The workload is exploratory and time-limited
  • Unity Catalog lineage is not required for the PoC scope
  • Explicit approval is obtained from data owner, security owner, and platform owner
  • Compensating controls are defined and documented

FSx for ONTAP Sizing Considerations

Before selecting an analytics engine, validate FSx for ONTAP-side capacity:

  • Throughput capacity — S3 API throughput is bounded by the FSx for ONTAP file system's provisioned throughput
  • Expected S3 API request rate — high-frequency small object reads may hit IOPS limits
  • File count and average object size — large directories with many small files may increase listing latency
  • Prefix layout — flat vs hierarchical prefix design affects listing performance
  • NFS/SMB production workload window — analytics queries share throughput with existing file workloads
  • Snapshot / backup / replication schedule — SnapMirror and backup operations consume throughput
  • Isolation strategy — consider a dedicated volume or SVM for analytics access to avoid contention

Delta Lake production workloads require more than object read access. They require validated behavior for transaction log writes, atomic commit assumptions, concurrent writers, checkpointing, recovery, and lifecycle operations. This article does not validate FSx for ONTAP S3 AP for Delta write-path semantics.


Compensating Controls for Controlled boto3 PoC

If Instance Profile + boto3 is approved for a controlled PoC, define:

  • Dedicated cluster only (no shared compute)
  • Single-purpose instance profile (not reused across workloads)
  • Least-privilege S3 Access Point policy (specific prefix only)
  • Read-only permissions by default
  • Allowed prefix list (explicitly documented)
  • CloudTrail data event coverage where enabled and applicable
  • Notebook/job owner (named individual)
  • Approval expiration date
  • No production writeback
  • No regulated data unless separately approved with compensating controls

Recommended Databricks-side controls:

  • Restrict instance profile usage to an approved group via workspace admin settings
  • Enforce dedicated access mode through cluster policy
  • Restrict cluster creation permissions to approved users
  • Tag PoC clusters with owner, approval ID, and expiration date
  • Disable or terminate clusters after approval expiration
  • Review workspace audit logs for cluster and instance profile usage

Data Protection Considerations

FSx for ONTAP S3 AP exposes access to file data; it does not replace ONTAP volume-level protection. When analytics workloads access source data via S3 AP, validate:

  • Snapshot schedule impact — analytics reads do not conflict with scheduled snapshots, but heavy write-back could
  • SnapMirror replication policy — source volume replication continues regardless of S3 AP access
  • Backup window vs analytics query window — concurrent backup and analytics may compete for throughput
  • Write-back isolation — analytics results should be written to a separate volume or prefix, not the source-of-record volume
  • Recovery behavior — if analytics workload reads during a failover event, understand the RPO/RTO implications

ONTAP S3 NAS bucket data is protected by volume-level SnapMirror asynchronous replication, not by S3-level replication. Plan DR at the volume level.


Discovery Questions for Partners

When a customer asks about Databricks + FSx for ONTAP S3 Access Points:

  1. Are the target files currently stored on NFS, SMB, or both?
  2. Is the workload read-only analytics, unstructured object processing, or Delta write?
  3. Is Unity Catalog lineage mandatory for this use case?
  4. Is this a regulated dataset (PHI, PII, financial)?
  5. Can the PoC run with a dedicated instance profile and limited prefix?
  6. What is the required concurrency and data size?
  7. Is executor-scale Spark processing required, or is driver-only sufficient?
  8. What rollback action is acceptable if FSx for ONTAP throughput impact is observed?
  9. Who approves non-Unity Catalog access paths?
  10. What evidence is required for security review?

Troubleshooting Playbook

When Databricks access to FSx for ONTAP S3 AP fails, isolate one layer at a time:

  1. IAM — Can the instance profile call s3:ListBucket on the S3 AP ARN? Can it call s3:GetObject?
  2. Unity Catalog — Does the same role work for a standard S3 bucket? Does it fail only for the FSx for ONTAP S3 AP ARN?
  3. Network — Is the workspace customer-managed or Databricks-managed? Can the cluster reach NFS TCP 2049? Are route tables and security groups correct?
  4. NFS server — Does showmount -e work? Does the ONTAP export policy allow the client?
  5. Local runtime — Does strace show mount() returning EACCES? Does tmpfs mount succeed? Does user-space NFS RPC succeed?
  6. Workaround — Does Dedicated + Instance Profile + boto3 work? Is bypassing Unity Catalog acceptable for this PoC?

Known Failure Signatures

Symptom Likely layer Next step
no session policy allows s3:ListBucket Unity Catalog session policy Compare regular S3 bucket vs FSx for ONTAP S3 AP with the same role
TCP 2049 unreachable Network / managed VPC boundary Test from customer-managed VPC
mount.nfs: access denied by server with mount() EACCES in strace Local runtime restriction Capture strace and /proc/self/status seccomp output
boto3 NoCredentialsError Instance profile / IMDS blocked Verify cluster mode is Dedicated and instance profile is registered
boto3 ReadTimeoutError on S3 AP FSx for ONTAP backend or VPC endpoint routing Test with a fresh SVM/volume to isolate; check FSx for ONTAP CPU utilization
boto3 ReadTimeoutError on S3 AP from Managed VPC (IMDS works) Managed VPC egress restriction blocking FSx for ONTAP backend Deploy in Customer-managed VPC (same VPC as FSx for ONTAP); VPC Peering does not resolve this
Driver-only boto3 works, but Spark job fails Executor credential/network path Validate credentials, routing, and concurrency from executors separately

What This Article Does Not Conclude

This article does not conclude that Databricks cannot ever support FSx for ONTAP S3 AP. It documents the behavior observed in one validated environment and identifies the platform boundaries that need vendor confirmation or additional support.


What to Tell Stakeholders

Current recommendation:

  • Use AWS-documented native service paths where they match the workload: Athena for SQL, Bedrock Knowledge Bases for RAG/GenAI, Glue or EMR Serverless for ETL/Spark, Lambda for serverless file processing, CloudFront for streaming, and Transfer Family for partner file exchange
  • Treat Athena as the validated read-oriented SQL path in Part 1. Treat Glue / EMR Serverless as validated ETL / Spark paths only where corresponding verification-pack evidence is available.
  • Treat Bedrock Knowledge Bases, Lambda (file processing), CloudFront, and Transfer Family as AWS-documented candidate paths that still require workload-specific validation
  • Use Databricks + Instance Profile + boto3 only for controlled PoC or unstructured data experiments
  • Do not position Unity Catalog + FSx for ONTAP S3 AP as production-ready until the session policy supports S3 Access Point ARN patterns
  • Do not rely on kernel NFS mounts inside Databricks until the platform explicitly supports this path
  • For Delta Lake production tables, continue to use supported object storage patterns

This validation should be used to guide architecture selection, not to disqualify Databricks from lakehouse workloads.

This validation should not be used to compare AWS-native services and Databricks as competing platforms. AWS-native services (Athena, Bedrock, Glue, EMR Serverless, Lambda) each have AWS-documented integration paths with FSx for ONTAP S3 AP — some validated in this series, others requiring workload-specific validation. Databricks is strong for governed lakehouse, Delta, ML, and production-scale data engineering workloads. The right choice depends on the access pattern, governance requirement, and workload type.


Lessons Learned

1. "S3-compatible" ≠ "works everywhere S3 works"

FSx for ONTAP S3 AP is S3-compatible at the API level, but platform security layers (session policies, VPC restrictions) may not recognize the ARN format. S3 API compatibility and platform-integrated S3 governance are different things.

2. Error messages can be misleading

mount.nfs: access denied by server made me spend hours checking ONTAP export policies. The real issue was a local runtime restriction. Always use strace when mount fails unexpectedly.

3. Platform security boundaries are not always documented

You discover these boundaries by hitting them. The troubleshooting playbook above can save you time.

4. Customer-managed VPC is essential for storage integration

If you need to connect Databricks to anything beyond standard S3 buckets, deploy in a Customer-managed VPC. Databricks-managed VPC provides limited customer control over cluster networking compared with a customer-managed VPC.

This was further confirmed by testing S3 AP access from a Databricks-managed VPC with VPC Peering: even with VPC Peering active, routes configured, security groups permissive, and a S3 Gateway Endpoint present, S3 AP requests to FSx for ONTAP timed out. The Databricks-managed VPC egress restrictions block not only direct IP communication but also S3 AP backend connectivity.

S3 AP routing note: S3 AP requests are routed through the S3 service endpoint, not directly to the FSx for ONTAP IP. VPC Peering between the requester VPC and the FSx for ONTAP VPC does not help because the S3 service needs internal connectivity to the FSx for ONTAP file system. Customer-managed VPC (same VPC as FSx for ONTAP) is the only validated path.

Databricks Control Plane (SaaS)
        ^
        | NAT Gateway (required outbound)
        |
Databricks Cluster ENI (Customer VPC, private subnet)
        |
        | Private VPC routing (no internet required)
        v
FSx for ONTAP ENI / SVM (same VPC, private subnet)
Enter fullscreen mode Exit fullscreen mode

For the Databricks Support Case Packet, include network evidence: cluster subnet ID, FSx for ONTAP subnet ID, route table IDs, security group rules, and DNS resolution for FSx for ONTAP endpoint.

5. Instance Profile is a pragmatic PoC workaround

Use Instance Profile + boto3 as a controlled PoC workaround. Do not use it as a substitute for Unity Catalog governance without a formal security review.

6. Always isolate variables when troubleshooting

When FSx for ONTAP S3 AP wasn't responding, I created a new SVM and volume to isolate the issue. This confirmed the problem was SVM-specific rather than a platform-wide limitation.

7. Negative validation creates value

A failed integration path can still create value when it prevents the wrong production architecture. This validation helps teams avoid assuming S3 API compatibility equals platform governance compatibility, choose the right engine for the right access pattern, and reduce time spent on ambiguous troubleshooting.


Databricks Support Case Packet

If you open a support case with Databricks, include:

  • Workspace type: Databricks-managed VPC or customer-managed VPC
  • Cluster access mode and DBR version
  • IAM role / instance profile configuration
  • Unity Catalog storage credential and external location configuration
  • Full AccessDenied error message (including the ARN and "no session policy" text)
  • S3 AP ARN and alias format
  • Network test results for NFS ports (TCP 2049, TCP 111, TCP 635)
  • strace output showing mount() EACCES
  • /proc/self/status showing seccomp mode
  • User-space NFS RPC success evidence (if applicable)
  • Instance Profile boto3 success evidence (if applicable)
  • showmount -e output (confirms export visibility)
  • tmpfs mount success evidence (proves mount syscall itself is allowed)

Use Case Fit Matrix

When this article says "validated in this broader series," it refers to evidence captured in the linked verification-pack or related articles, not to Databricks-specific validation in this Part 2 article.

Use case Best current path Why
SQL analytics on structured NAS files Athena + FSx for ONTAP S3 AP Verified read-oriented path with AWS-side governance controls, serverless
Enterprise IT RAG over documents Bedrock Knowledge Bases + FSx for ONTAP S3 AP AWS-documented tutorial; also validated in related series with permission-aware retrieval
ETL / data transformation Glue or EMR Serverless + FSx for ONTAP S3 AP Validated in this broader series where verification-pack evidence is available; validate production write-back semantics separately
Serverless file processing (thumbnails, OCR, transcription) Lambda + FSx for ONTAP S3 AP AWS-documented tutorial; validate for your workload
Large-scale Spark ETL EMR Serverless + FSx for ONTAP S3 AP or standard S3 bucket Validated in this series; Databricks executor-scale not validated on S3 AP
Production Delta Lake tables Supported object storage (S3 bucket) Required for Delta write semantics and UC governance
Unstructured data experimentation (Databricks) Instance Profile + boto3 PoC Works in driver-only pattern, needs governance review
Video streaming from NAS CloudFront + FSx for ONTAP S3 AP AWS-documented tutorial; validate caching, latency, and file size for your content
External partner file exchange Transfer Family + FSx for ONTAP S3 AP AWS-documented path; also validated in related series; validate file operation limitations (rename, append, upload size)
Lightweight serverless analytics DuckDB Lambda + FSx for ONTAP S3 AP Planned Part 3 validation; candidate for lightweight, low-idle-cost analytics
BI / dashboarding over NAS data Candidate: QuickSight via Athena or Glue Catalog AWS positions BI as a candidate use case; validate whether access path is Athena-backed or catalog-mediated

Cost Model Considerations

Engine Primary cost driver Best fit
Athena Data scanned (per TB) Occasional SQL queries, serverless
Bedrock Knowledge Bases Model invocation + embedding + retrieval RAG / GenAI over enterprise documents
Glue DPU-hours ETL pipelines, data transformation
Databricks DBU + cloud compute instance hours Lakehouse pipelines, ML, Delta workloads
EMR Serverless vCPU / memory × runtime duration Spark ETL without persistent clusters
Lambda + DuckDB Invocation duration × memory Lightweight serverless analytics, event-driven
CloudFront Data transfer + requests Video/media streaming from NAS

Cost comparison is not the focus of this article. Each engine has a fundamentally different pricing model. Databricks provides compute policies to control cluster creation, instance types, auto-termination, and cost-related attributes. For cost optimization, evaluate based on workload pattern (interactive vs batch, frequency, data volume) rather than unit price alone.


Partner / Customer Conversation Guide

If a customer asks whether Databricks can directly process FSx for ONTAP S3 Access Point data:

  • AWS-native service paths such as Athena, Bedrock Knowledge Bases, Glue, EMR Serverless, Lambda, CloudFront, and Transfer Family have AWS-documented integration patterns with FSx for ONTAP S3 AP. In this series, Athena (Part 1), Glue, and EMR Serverless have been validated; the other paths should be validated per workload, Region, IAM model, FSx for ONTAP-side authorization, and governance requirement.
  • Databricks Unity Catalog integration requires vendor confirmation for S3 Access Point ARN handling
  • Instance Profile + boto3 can be used for controlled PoC experiments, but it bypasses Unity Catalog governance and is classified as a legacy data access pattern by Databricks
  • Production Delta Lake workloads should continue to use supported object storage patterns
  • Any Databricks integration should be validated per workspace type, cluster mode, runtime version, IAM path, and governance requirement

Next Validation Metrics

Current blocker: Executor-scale validation requires a Customer-managed VPC workspace (same VPC as FSx for ONTAP). The Databricks-managed VPC workspace was tested with VPC Peering and Instance Profile (2026-05-24) — S3 AP access timed out due to managed VPC egress restrictions. A Customer-managed VPC workspace creation is pending Databricks support ticket resolution.

For executor-scale validation (not yet performed):

  • Object listing latency per executor
  • Total objects processed across cluster
  • Per-executor success/failure rate
  • Throughput per executor
  • Retry count and S3 API error rate
  • FSx for ONTAP throughput utilization during distributed access
  • Cost per processed GB

Driver-only boto3 success is not sufficient for Spark workloads. The next validation should run boto3 calls from executors using mapPartitions and compare credential, routing, latency, and error behavior across workers.

Executor-scale validation should not only test success/failure. It should capture per-executor latency, retry count, error code, and object count so that routing and concurrency behavior can be reviewed.

Benchmark run guidance:

  • Cold run: at least 1 (first access after cluster start, no metadata cache)
  • Warm metadata run: at least 1 (after initial listing populates metadata cache)
  • Repeated run: at least 3 (steady-state measurement)
  • Report: p50, p90, p95, p99 latency, plus average, min, max, and outliers
  • Include: object count, average object size, prefix depth, concurrent executor count
  • Include: FSx for ONTAP throughput utilization during test window
  • Note: S3 AP via FSx for ONTAP may exhibit metadata warm-up effects and prefix layout sensitivity. Cold vs warm differences should be documented explicitly.

Additional FSx for ONTAP metrics to capture:

  • FSx for ONTAP throughput utilization (% of provisioned capacity)
  • FSx for ONTAP CPU utilization
  • Network throughput (inbound/outbound)
  • S3 API request count by operation (List, Get, Head)
  • File count per prefix
  • Average object size
  • NFS/SMB latency during concurrent S3 API reads (contention indicator)

Expected output format (JSONL per executor):

{"executor_host": "ip-10-0-xx-yy", "partition_id": 3, "operation": "list_objects_v2", "status": "success", "latency_ms": 183, "objects_seen": 100, "error_code": null}
Enter fullscreen mode Exit fullscreen mode

Adoption Success Metrics

For a controlled Databricks + FSx for ONTAP S3 AP PoC, define success criteria beyond technical pass/fail:

Baseline metrics (capture before validation):

  • Average search/access time (minutes) for target documents
  • Monthly document access count via current path
  • Current copy pipeline runtime (if applicable)
  • Current data freshness lag (hours)
  • Current support ticket count related to data access

PoC outcome metrics:

  • Number of target datasets evaluated
  • Number of successful read operations
  • Number of governance exceptions required
  • Time to first successful access
  • Number of support issues raised
  • Whether the customer selected Athena, Databricks, or another engine after validation
  • Decision outcome: proceed / adjust / stop
  • Time saved by early boundary identification (vs discovering in production)

Stop criteria:

  • No measurable business value after validation period
  • Governance exception required for production path with no compensating control available
  • Executor-scale validation fails with unacceptable error rate (define threshold before PoC)
  • FSx for ONTAP workload impact exceeds approved threshold (e.g., throughput utilization > 80%)
  • Vendor confirmation indicates unsupported path with no roadmap commitment
  • Security review rejects the access path without remediation option

Series Evaluation Criteria

Across this series, each engine is evaluated by:

  • Read-path compatibility
  • Write-path compatibility
  • Governance model
  • Operational impact
  • Performance evidence
  • Production readiness gap
  • Best-fit use case

Well-Architected Mapping

These criteria align with the AWS Well-Architected Data Analytics Lens:

Pillar Evaluation focus in this series
Security Governance model, IAM/AP policy, audit evidence, session policy behavior
Reliability Failure modes, rollback path, support case evidence, DR considerations
Performance Efficiency Throughput, executor-scale behavior, FSx for ONTAP utilization, latency
Cost Optimization Engine-specific cost model, idle cost, cost per processed GB
Operational Excellence Runbook, evidence template, support packet, monitoring

Business Value of Negative Validation

Negative validation is not failure. It is risk reduction.

A failed integration path can still create value when it prevents the wrong production architecture. This validation helps teams:

  • Avoid assuming S3 API compatibility equals platform governance compatibility
  • Choose the right engine for the right access pattern (Athena for read-only SQL, Databricks for lakehouse/ML)
  • Identify early when vendor confirmation is required before committing architecture
  • Reduce time spent on ambiguous troubleshooting by providing reproducible evidence
  • Prevent wasted PoC investment by documenting boundaries before production design
  • Enable informed conversations with vendors, partners, and security reviewers

For enterprise customers, early boundary identification can save weeks of engineering time and prevent costly architecture rework after production deployment.


What's Next

Series index:

  • Part 1: Athena — Query NAS Data In Place (validated read-oriented path, 9/9 negative tests pass)
  • Part 2: Databricks (this article) — session policy deep dive
  • Part 3: Snowflake — LIST Works, SELECT Doesn't (same session policy pattern)
  • Part 4: DuckDB Lambda — lightweight serverless analytics validation
  • Part 5: EMR Spark — read-write ETL pipeline (coming soon)
  • Part 6: Redshift Spectrum — DWH meets NAS data (coming soon)
  • Part 7: Trino — open-source SQL on NAS data (coming soon)

Open items:

  • Support cases: Waiting for Databricks response on session policy and NFS mount questions
  • FUSE NFS client: Investigating whether a user-space NFS client can bypass the runtime restriction

Caution on FUSE/user-space NFS: FUSE or user-space NFS clients should be treated as experimental only. They require separate validation for POSIX semantics, caching behavior, consistency, performance, failure recovery, and vendor supportability. Do not treat user-space NFS RPC success as a production workaround.


References

Related series by the same author (FSx for ONTAP S3 Access Points with other AWS services):

ONTAP S3 Multiprotocol vs FSx for ONTAP S3 Access Points:

  • ONTAP S3 multiprotocol (ONTAP 9.12.1+): S3 NAS bucket model on ONTAP SVM, enabling S3 clients to access NAS data directly on the ONTAP cluster
  • FSx for ONTAP S3 Access Points: AWS-managed S3 Access Point endpoint attached to FSx for ONTAP volume, integrating with AWS IAM, VPC, and S3-compatible services
  • Both expose NAS data via S3-style access, but the authorization path, service integration, and operational model differ. This article focuses on FSx for ONTAP S3 Access Points.


This article is part of the "FSx for ONTAP S3 Access Points × Lakehouse Deep Dive" series. All tests were performed on a real AWS environment with FSx for ONTAP (ONTAP 9.17.1, ap-northeast-1) and Databricks (DBR 17.3 LTS, Premium tier) in May 2026.


Scope reminder: This article documents observed behavior in one validated environment. It does not validate production readiness, distributed executor-scale processing, or all Databricks runtime versions. Terminology uses "observed in this environment" rather than "unsupported" or "incompatible" — platform behavior may change with future updates.

Future updates: If Databricks platform behavior changes or vendor confirmation becomes available, this article should be updated with the new validation result rather than treated as a permanent compatibility statement.

Disclaimer: This article is an independent validation report and does not represent Databricks, AWS, or NetApp official guidance. Product behavior, support status, and platform capabilities may change. Always validate in your own environment and consult vendor documentation and support channels.

Top comments (0)