TL;DR
You can query files stored on Amazon FSx for NetApp ONTAP directly from Amazon Athena through an FSx-attached S3 Access Point — without copying the source data to an S3 bucket. The source files remain on the FSx for ONTAP volume and are accessed through S3 object APIs.
I verified this end-to-end: Parquet files written via NFS are immediately queryable from Athena using the official AWS tutorial pattern.
This is Part 1 of a series exploring how FSx for ONTAP S3 Access Points integrate with various Lakehouse platforms. Part 2 covers Databricks — where platform security boundaries make things significantly more complex.
GitHub Repository: fsxn-lakehouse-integrations
If you want to reproduce this validation, start from the repository's integrations/athena/ directory, which contains CloudFormation templates, sample data generators, and query scripts.
What Is Verified in This Article
Verified:
- NFS-written Parquet file is visible via FSx S3 AP (
ListObjectsV2,StorageClass: FSX_ONTAP) - Athena can query the file through Glue Data Catalog
- Standard S3 bucket result location works as the documented pattern
- Experimental FSx S3 AP result output worked in my environment
Not verified:
- Delta / Hudi / Iceberg writes
- CTAS production pattern to FSx S3 AP
- S3 bucket event notification semantics
- Large-scale performance limits
- CloudTrail data event coverage (audit evidence approach should be validated per environment)
Why This Matters
Enterprise file servers hold massive amounts of data — design files, inspection images, research documents, log archives. Traditionally, to analyze this data with cloud-native tools like Athena, you had to:
- Copy data from NFS/SMB to S3 (DataSync, scripts, etc.)
- Maintain sync pipelines
- Pay for duplicate storage
- Deal with stale data
FSx for ONTAP S3 Access Points (launched December 2025) change this. The same volume that serves NFS/SMB clients now exposes an S3-compatible API. Athena queries hit the same bytes that your NFS clients read — no copy required for the source dataset.
Users (NFS/SMB) Athena (S3 API)
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ FSx for ONTAP Volume │
│ /analytics/sensor_data.parquet │
│ /analytics/logs/*.json │
└─────────────────────────────────────────────┘
Use Cases This Unlocks
This pattern is useful when enterprise data already lives on NFS/SMB file shares and analytics teams want to query it without building a copy pipeline to S3.
Examples:
- Manufacturing: Sensor logs, inspection results, quality reports produced by factory systems
- SAP / ERP: Batch export files, operational reports, reconciliation extracts, and analytics copies — not direct replacement for application-native persistence or HA design
- Financial services: Reconciliation files, transaction logs, regulatory extracts
- Healthcare research: De-identified datasets, imaging metadata, study outputs
- EDA / Semiconductor: Design artifacts, simulation outputs, verification logs
- Enterprise file services: Archives for compliance analysis, audit evidence
Mission-critical workload note
This pattern provides an analytics read-access layer for existing file data. It does not replace workload-specific HA, backup, Snapshot, SnapMirror, or DR designs. For SAP, databases, VDI, and enterprise file services, treat Athena-on-FSx as an analytics and evidence layer, not as the primary resilience architecture.
Workload Isolation Guidance
For mission-critical workloads, do not point exploratory analytics directly at the same directory used by latency-sensitive application writes unless the operational impact has been tested.
Recommended pattern:
-
Application-owned path:
/prod/app-output/ -
Analytics landing path:
/analytics/curated/ - Athena query result path: Standard S3 bucket (conservative), or a separately validated output path
- Snapshot / backup policy: Owned by the workload team
- Glue/Athena access: Owned by the analytics platform team
For SAP, database exports, or ERP file drops, treat this pattern as a read-access analytics layer. Do not change application HA, backup, restore, or DR design just because the files are queryable through S3 APIs.
In this context, an analytics copy means an application-produced or batch-exported file that is safe for downstream analytics, not the primary application persistence path.
Operational Impact Validation
Before production use, validate operational impact:
- Baseline NFS/SMB workload latency and throughput before enabling analytics queries
- Athena query behavior during normal application write activity
- FSx provisioned throughput utilization during scans (analytics and application workloads share the same backend throughput)
- Query concurrency limits for the analytics team
- Rollback plan if analytics workload affects application workload
Recommended metrics include FSx throughput utilization, client-side NFS/SMB latency, Athena query runtime, bytes scanned, and application-side error or timeout rates during query execution.
Rollback plan examples include disabling the Athena workgroup, revoking the S3 Access Point policy for analytics roles, reducing analytics query concurrency, or moving analytics to an isolated curated path.
What This Means for Production
For production, treat this as a shared-storage analytics access pattern. The value is eliminating source data copy; the responsibility is validating workload isolation, throughput impact, governance, and rollback.
This article is not a production certification. It is intended to start a production readiness discussion around workload isolation, governance, and rollback.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ AWS Account │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ FSx for ONTAP│ │ S3 Access │ │ Athena │ │
│ │ Volume │◄────│ Point │◄────│ (Serverless) │ │
│ │ │ │ (Internet │ │ │ │
│ │ /analytics/ │ │ origin) │ │ SELECT ... │ │
│ └──────────────┘ └──────────────┘ │ FROM table │ │
│ ▲ ▲ └────────────────┘ │
│ │ │ │ │
│ NFS/SMB clients Glue Crawler Query results │
│ (write data) (schema discovery) (→ S3 bucket) │
└─────────────────────────────────────────────────────────────────┘
Key points:
- The access point must use Internet network origin. Athena accesses S3 from managed infrastructure outside your VPC. The AWS tutorial requires internet network origin for this path. VPC-origin access points deny requests from Athena.
- Glue Data Catalog provides the schema layer between Athena and the S3 AP
- Query results are written to an S3 bucket (the standard Athena pattern), not back to the FSx volume. See Observed Behavior for an experimental alternative.
Prerequisites
- FSx for ONTAP file system (ONTAP 9.17.1+)
- A volume with data (Parquet, CSV, JSON, etc.)
- S3 Access Point created with Internet network origin
- An Athena workgroup with a query results location (standard S3 bucket)
- IAM permissions for Athena, Glue, and S3 AP access
Step 1: Create the S3 Access Point
aws fsx create-and-attach-s3-access-point \
--name my-analytics-ap \
--type ONTAP \
--ontap-configuration '{
"VolumeId": "<YOUR_VOLUME_ID>",
"FileSystemIdentity": {
"Type": "UNIX",
"UnixUser": {"Name": "fsxn_athena_reader"}
}
}' \
--region <YOUR_REGION>
Wait for the lifecycle to become AVAILABLE:
aws fsx describe-s3-access-point-attachments \
--filters Name=volume-id,Values=<YOUR_VOLUME_ID> \
--region <YOUR_REGION> \
--query 'S3AccessPointAttachments[].{Name:Name,Lifecycle:Lifecycle,Alias:S3AccessPoint.Alias}'
Output:
[{
"Name": "my-analytics-ap",
"Lifecycle": "AVAILABLE",
"Alias": "my-analytics-ap-xxxxxxxxxxxxxxxxxxxxxxxxxxxx-ext-s3alias"
}]
Note: The alias ending in
-ext-s3aliasidentifies this as an FSx for ONTAP S3 Access Point (as opposed to regular S3 Access Points which end in-s3alias).Security note for file-system identity
This walkthrough uses a dedicated read-only identity (fsxn_athena_reader). Make sure the corresponding UNIX/Windows permissions allow read access to the analytics path. Avoid usingrootin production — scope the identity to the minimum permissions required.
Step 2: Set the Access Point Policy
This walkthrough uses role-based principals for Athena and Glue. Replace the placeholder role ARNs with the IAM roles used by your Athena workgroup and Glue crawler. Avoid account-wide principals in production.
aws s3control put-access-point-policy \
--account-id <YOUR_ACCOUNT_ID> \
--name my-analytics-ap \
--policy '{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowAnalyticsRead",
"Effect": "Allow",
"Principal": {"AWS": [
"arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<ATHENA_QUERY_ROLE>",
"arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<GLUE_CRAWLER_ROLE>"
]},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:accesspoint/my-analytics-ap",
"arn:aws:s3:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:accesspoint/my-analytics-ap/object/*"
]
}]
}' \
--region <YOUR_REGION>
The policy above is the conservative read-only analytics policy. If you intentionally test query result output to the FSx S3 Access Point (see Observed Behavior), add s3:PutObject scoped to the experimental output prefix only:
{
"Sid": "AllowExperimentalResultWrite",
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<ATHENA_QUERY_ROLE>"},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:accesspoint/my-analytics-ap/object/athena-results/*"
}
Security note: FSx for ONTAP S3 Access Points enforce S3 Block Public Access by default — this cannot be disabled. All requests require valid IAM credentials. Additionally, the file system user associated with the access point must have read permission on the files being queried.
Policy note: The policy above is the minimum that worked in my validation. If your Glue crawler or Athena workgroup reports location-related access errors, compare the policy with the official tutorial and CloudTrail events, and add only the required actions.
Step 3: Upload Test Data via NFS
On a machine with NFS access to the FSx volume:
import pandas as pd
import numpy as np
# Generate 10,000 rows of sensor data
np.random.seed(42)
n_rows = 10000
df = pd.DataFrame({
'timestamp': pd.date_range('2026-01-01', periods=n_rows, freq='1min'),
'sensor_id': np.random.choice(['sensor_A', 'sensor_B', 'sensor_C',
'sensor_D', 'sensor_E'], n_rows),
'temperature': np.round(np.random.normal(25, 5, n_rows), 2),
'humidity': np.round(np.random.uniform(30, 90, n_rows), 2),
'pressure': np.round(np.random.normal(1013, 10, n_rows), 2),
'status': np.random.choice(['normal', 'warning', 'critical'], n_rows,
p=[0.85, 0.12, 0.03])
})
# Write as Parquet to the NFS-mounted volume
df.to_parquet('/mnt/fsxn/analytics/sensor-data/sensor_data.parquet', index=False)
print(f"Written {len(df)} rows, {df.memory_usage(deep=True).sum()/1024:.0f} KB")
The same file is now accessible via both NFS (/mnt/fsxn/analytics/sensor-data/sensor_data.parquet) and S3 API (s3://<AP_ALIAS>/sensor-data/sensor_data.parquet).
Step 4: Verify S3 AP Access
aws s3api list-objects-v2 \
--bucket "$AP_ALIAS" \
--prefix "sensor-data/" \
--region <YOUR_REGION>
Output:
{
"Contents": [{
"Key": "sensor-data/sensor_data.parquet",
"Size": 252858,
"StorageClass": "FSX_ONTAP"
}]
}
Note the StorageClass: FSX_ONTAP — this confirms the data lives on FSx, not S3.
Step 5: Create Glue Database and Table
aws glue create-database \
--database-input '{"Name": "fsxn_analytics"}' \
--region <YOUR_REGION>
You can either run a Glue Crawler for automatic schema discovery (recommended by the AWS tutorial), or create the table manually via Athena:
CREATE EXTERNAL TABLE fsxn_analytics.sensor_data (
timestamp TIMESTAMP,
sensor_id STRING,
temperature DOUBLE,
humidity DOUBLE,
pressure DOUBLE,
status STRING
)
STORED AS PARQUET
LOCATION 's3://<AP_ALIAS>/sensor-data/'
TBLPROPERTIES ('parquet.compression'='SNAPPY');
Step 6: Query with Athena
Basic aggregation
SELECT
sensor_id,
COUNT(*) AS readings,
ROUND(AVG(temperature), 2) AS avg_temp,
ROUND(AVG(humidity), 2) AS avg_humidity,
SUM(CASE WHEN status = 'critical' THEN 1 ELSE 0 END) AS critical_count
FROM fsxn_analytics.sensor_data
GROUP BY sensor_id
ORDER BY critical_count DESC;
Verified result
sensor_id | readings | avg_temp | avg_humidity | critical_count
----------|----------|----------|--------------|---------------
sensor_A | 2027 | 24.89 | 59.84 | 68
sensor_B | 1986 | 25.11 | 60.23 | 62
sensor_C | 2013 | 24.95 | 59.91 | 59
sensor_D | 1974 | 25.03 | 60.15 | 55
sensor_E | 2000 | 24.98 | 60.02 | 56
Query time: 1.46 seconds | Data scanned: 67 KB | Engine: Athena v3
Observed Behavior: Query Results Written to the FSx S3 Access Point
The AWS tutorial states:
"Athena reads data from your FSx for ONTAP volume through the access point. Athena query results are written to the Amazon S3 results bucket, not back to the FSx for ONTAP volume."
In my validation, however, setting OutputLocation to the FSx for ONTAP S3 Access Point alias succeeded and wrote the .csv and .metadata files back to the FSx volume:
aws athena start-query-execution \
--query-string "SELECT 1 AS test" \
--result-configuration \
"OutputLocation=s3://<AP_ALIAS>/athena-results/" \
--work-group primary \
--region <YOUR_REGION>
Result: SUCCEEDED in 584ms
The result files appeared on the FSx volume and were immediately accessible via NFS.
Treat this as observed behavior from my environment, not a general production recommendation. The conservative production pattern is:
- Source data: FSx for ONTAP S3 Access Point
- Athena query results: Standard S3 bucket (as documented)
The experimental pattern validated in this post:
- Source data: FSx for ONTAP S3 Access Point
- Athena query results: FSx for ONTAP S3 Access Point (observed to work, not documented)
Validate this in your own environment before relying on it.
Governance warning: Do not enable experimental query result output to FSx S3 AP for sensitive datasets unless query result retention, encryption, audit evidence, and file-system permissions are reviewed. Query results may contain derived sensitive information. For sensitive datasets, experimental result output should require approval from the data owner, security owner, and workload owner.
Performance Characteristics
| Metric | Observed | Notes |
|---|---|---|
| Simple SELECT query | 584 ms | Includes result write |
| Aggregation (10K rows, 67KB) | 1.46 s | GROUP BY with 5 aggregations |
| Data scan cost | Standard Athena pricing | $5 per TB scanned |
| Storage class | FSX_ONTAP | Confirmed in ListObjects |
Performance note
These numbers validate functional compatibility, not performance limits. The dataset is intentionally small (67 KB, 10K rows). For real analytics workloads, test with realistic file sizes, object counts, partition layouts, concurrent queries, and FSx provisioned throughput. The throughput available through the S3 API depends on the FSx file system's provisioned throughput capacity (AWS documentation).
S3 API Compatibility Boundary
FSx for ONTAP S3 Access Points expose file data through S3 object APIs, but they should not be treated as standard S3 buckets.
The safe mental model is:
- Use S3 APIs for object read/write access to files on FSx
- Use Glue and Athena for read-oriented analytics
- Do not assume S3 bucket-level features exist (event notifications, versioning, lifecycle policies)
- Do not assume lakehouse commit semantics (rename, conditional writes)
- Validate every platform integration separately
In this article, the verified pattern is read-oriented analytics over Parquet/CSV/JSON files. Transactional table formats and commit protocols are outside the safe default boundary.
Compatibility Matrix
Validated by legend:
- This validation: Actually executed commands or queries in this environment and confirmed the result
- Supported operations review: Confirmed based on the supported operations documentation or official tutorial
- Supported operations review required: Not yet confirmed; additional validation needed before use
| Capability | Status | Validated by | Notes |
|---|---|---|---|
| ListObjectsV2 | ✅ Verified | This validation | S3 AP alias worked |
| GetObject (Parquet scan) | ✅ Verified | This validation | Athena v3 |
| PutObject (small result file) | ⚠️ Observed | This validation | Not documented as Athena result pattern |
| Glue table over S3 AP | ✅ Verified | This validation | Manual DDL and Crawler |
| CTAS to S3 AP | ❌ Failed in validation | This validation | Not part of the documented tutorial pattern; use standard S3 output |
| Delta Lake writes | ❌ Not recommended | Supported operations review | Commit protocol depends on rename/atomic semantics not available |
| Hudi/Iceberg writes | ❌ Not recommended | Supported operations review | Requires commit semantics beyond simple object read |
| S3 bucket event notifications | ❌ Not part of verified pattern | Supported operations review required | Do not assume bucket-level eventing; validate against supported operations |
CTAS is a write-path pattern, not just a read query. Treat CTAS separately from read-oriented SELECT validation because it writes new table data to a target S3 location and may leave partial/orphaned files on failure. CTAS should not be included in the initial read-oriented validation scope.
Transactional lakehouse formats may require semantics beyond simple object read/write, such as:
- Atomic commit behavior
- Rename or move-like commit operations
- Conditional writes (If-None-Match)
- Manifest consistency
- Concurrent writer coordination
- Cleanup of partial/orphaned files
This article does not validate those semantics. It validates read-oriented analytics over existing files.
Governance and Compliance Considerations
This pattern keeps the source files on FSx for ONTAP, but it does not remove the need for data governance.
Before using this pattern with regulated or sensitive datasets, review:
- Data classification of source files
- IAM and S3 Access Point policy scope (least privilege)
- File system identity mapped to the access point (UNIX/Windows user permissions apply)
- Glue Data Catalog permissions (who can see the table metadata)
- Athena workgroup controls (query limits, result encryption)
- Query result location and retention (results may contain derived sensitive data)
- CloudTrail / audit evidence requirements
- Snapshot, backup, retention, and deletion policy
Query results can be more sensitive than the original dataset because they may aggregate, filter, or derive new information. Apply encryption, retention, and access controls to the Athena result location as carefully as the source dataset.
This article is a technical validation, not a compliance attestation.
Production Controls Checklist
For regulated or sensitive datasets, define the following before production use:
- [ ] Athena workgroup result location (standard S3 bucket)
- [ ] Whether workgroup settings override client-side result settings
- [ ] Query result encryption mode and KMS key ownership
- [ ] Query result retention and deletion policy
- [ ] IAM principals allowed to query the Glue table
- [ ] File-system identity mapped to the S3 Access Point (dedicated, not root)
- [ ] Audit evidence approach defined and validated (e.g., CloudTrail coverage for the S3 Access Point where applicable, with sample events captured as PoC evidence)
- [ ] Approval process for enabling experimental result output to FSx S3 AP
For regulated workloads, consider enabling Athena workgroup override so that query result location and encryption cannot be changed by client-side settings. This prevents individual clients from changing where query results are written or how they are encrypted.
For regulated workloads, experimental writeback should be disabled by default and enabled only after explicit approval from the data owner, security owner, and workload owner.
Experimental writeback may be enabled only when:
- Approval scope is documented
- Output path is isolated from source data
- Encryption and retention are defined for the output path
- Cleanup and rollback procedures are documented
- Review expiration date is set
Minimum audit evidence artifacts for PoC completion:
- Scope statement: what the audit evidence demonstrates and what it does not (e.g., "validates access path and query result control for PoC scope; does not demonstrate full production compliance")
- Access path description (IAM → AP policy → file-system identity)
- Sample successful read event
- Sample denied access event (if applicable)
- Query result location configuration
- Encryption configuration
- Workgroup override setting (if used)
- Reviewer sign-off (name, role, date, decision)
30-Minute Validation Flow
- Create or verify the FSx S3 Access Point (
AVAILABLElifecycle) - Write one Parquet file through NFS to the analytics path
- Confirm
StorageClass: FSX_ONTAPwithlist-objects-v2 - Create the Glue table (manual DDL or crawler)
- Run one Athena query
- Capture the validation artifacts (see below)
- Decide Go / No-Go using the PoC Success Criteria
First Success Path
If you are validating this for the first time, keep the scope small.
Expected outcome:
- One Parquet file written through NFS is visible through the S3 Access Point
- Glue table creation or crawler schema discovery succeeds
- Athena can query the file in place
- Query result location behavior is validated and documented
- NFS/SMB clients can still access the original file
- IAM and file-system identity boundaries are understood
Do not start with Delta Lake, Hudi, Iceberg writes, large scans, or concurrent workloads. Prove the read path first.
PoC Success Criteria
Minimum success:
- S3 Access Point attachment is
AVAILABLE -
ListObjectsV2returns the expected test file - Glue table points to the S3 AP alias
- Athena query succeeds and returns correct results
- Results are reproducible from a clean workgroup/session
Operational success:
- IAM role and S3 AP policy are scoped to the analytics roles
- Athena workgroup controls are defined
- Query result location and retention are documented
- Dataset size and scan cost are measured
- FSx throughput impact is measured during query
- Existing NFS/SMB application workload impact is measured during Athena queries
Go / No-Go criteria:
- Go: Read-only analytics on Parquet/CSV/JSON works with acceptable latency and cost
- No-Go: Workload requires Delta/Hudi/Iceberg write commits through the S3 AP
- No-Go: Platform governance requires Unity Catalog external locations and the platform cannot yet authorize the S3 AP (see Part 2) <!-- TODO: Replace with actual Part 2 URL after publication -->
Performance Test Plan
Note: This section defines the performance test plan and metrics to collect. It does not present benchmark results. Actual benchmark outputs will be added under
verification-pack/after validation runs are completed.
The next validation should include:
- 1 GB / 10 GB / 100 GB datasets
- Many small files vs fewer large Parquet files
- Partitioned layout (
date=YYYY-MM-DD/sensor_id=...) - Concurrent Athena queries
- Different FSx throughput capacity settings (128 / 256 / 512+ MBps)
- NFS writer activity during Athena scans
- Standard S3 result bucket vs observed FSx S3 AP result output
The goal is to separate Athena scan behavior, Glue metadata behavior, and FSx provisioned-throughput impact.
Additional request pattern considerations:
- Sequential vs parallel S3 API reads
- Prefix layout impact on listing performance
- Small object listing overhead
- Repeated query behavior with warm Glue/Athena metadata
Metrics collection sources:
- FSx metrics: CloudWatch (FSx namespace)
- Athena query metrics:
get-query-executionAPI (EngineExecutionTimeInMillis, DataScannedInBytes) - Client-side latency: CLI timing or SDK instrumentation
- Error/timeout sources: Athena query execution status and failure reason, client-side logs, application-side timeout logs, CloudTrail events where applicable
Record results separately for cold run (1+), warm metadata run (1+), repeated run (3+ executions). Report average, min, max, and notable outliers.
Validation Artifacts
For reproducibility, capture the following artifacts in your PoC:
- S3 Access Point attachment lifecycle output (
describe-s3-access-point-attachments) -
list-objects-v2output showingStorageClass: FSX_ONTAP - Glue table DDL or crawler output
- Athena query execution ID
- Athena query runtime and scanned bytes
- Query result location and file listing
- NFS listing showing the original source file is unchanged
- IAM policy and access point policy used for the test
What's Next
In Part 2, I'll cover what happens when you try to connect Databricks to FSx for ONTAP S3 Access Points — where Unity Catalog's session policy, seccomp filters, and platform security boundaries create a significantly more complex picture.
References
- AWS What's New: Amazon FSx for NetApp ONTAP now supports Amazon S3 access (Dec 2, 2025)
- AWS Tutorial: Query files with Athena
- FSx for ONTAP S3 Access Points documentation
- Supported S3 operations for FSx S3 AP
- GitHub: fsxn-lakehouse-integrations
This article is part of the "FSx for ONTAP S3 Access Points × Lakehouse Deep Dive" series. All tests were performed on a real AWS environment with FSx for ONTAP (ONTAP 9.17.1, ap-northeast-1) in May 2026.
Scope reminder: This article verifies a limited read-oriented scenario. It does not validate production readiness, write-path behavior, distributed executor-scale processing, or all third-party analytics engines.
Article update plan: v1.0 (current) — Scope, observed behavior, validation plan. Future updates: v1.1 — Benchmark results with realistic datasets. v1.2 — Security Verified candidate review. v1.3 — Production workload isolation test results.
Top comments (0)