TL;DR
In Part 1, Athena worked cleanly. In Part 2, Databricks hit session policy boundaries. This Part 3 validates Snowflake's path — and it works.
Snowflake can query FSx for ONTAP S3 Access Point data — but only with the correct stage configuration. Without the AWS_ACCESS_POINT_ARN parameter, SELECT fails with "access denied" while LIST works. With it, the tested read, governance, and AI paths work: SELECT, External Tables, COPY INTO load, Directory Tables, governance tags, and 8 out of 10 Cortex AI functions work on FSx data (7 directly, 1 via COPY INTO for Cortex Search).
| Configuration | LIST | SELECT | External Table | Cortex AI (text) | Vision AI |
|---|---|---|---|---|---|
| AP alias only (no ARN) | ✅ | ❌ Access Denied | ❌ | ❌ | ❌ |
AP alias + AWS_ACCESS_POINT_ARN
|
✅ | ✅ | ✅ | ✅ Direct | ✅ Via staging |
This appears to be a recurring integration pattern in this series: platforms that generate restrictive session policies need an explicit S3 Access Point ARN parameter so the generated policy includes the regional access point ARN.
Quick Decision Guide:
-
Zero-copy governed read on NAS data → External Table with
AWS_ACCESS_POINT_ARN - Full AI + maximum query performance → COPY INTO internal table
- RAG / semantic search over NAS documents → COPY INTO → Cortex Search Service (198ms)
GitHub Repository: fsxn-lakehouse-integrations
How to Read This Article
This article is:
- A reproduction-focused validation report
- Evidence from one environment (Snowflake Standard, ap-northeast-1)
- A configuration guide for Snowflake + FSx for ONTAP S3 AP
Read by role:
- Snowflake admin: Stage configuration → Working setup
- Storage engineer: Evidence matrix → Root cause analysis
- Data engineer: What works today → External Table setup
- Partner / SA: Partner Decision Card → Architecture guidance
- Security / governance reviewer: Governance Impact Summary → Regulated Workload Checklist
- AI/ML engineer: AI / ML Integration Path → MLOps Boundary
Prerequisite Concepts
Before reading this article, it helps to understand:
- Snowflake Storage Integration — an object that stores a reference to an IAM role for accessing external cloud storage
- Snowflake External Stage — maps a cloud storage URL to a storage integration for data access
- External Table — a Snowflake table that reads data directly from files on an external stage (no data copy)
-
AWS_ACCESS_POINT_ARN— a stage parameter that tells Snowflake to include the S3 Access Point ARN in its generated session policy -
S3 Access Point ARN vs S3 bucket ARN — S3 AP uses
arn:aws:s3:<region>:<account>:accesspoint/<name>, notarn:aws:s3:::<bucket> - Directory Table — a Snowflake feature that exposes file metadata (path, size, date) from a stage as a queryable table
Important premise: Snowflake does NOT officially document FSx for ONTAP S3 Access Points as a supported External Stage storage backend. The
AWS_ACCESS_POINT_ARNparameter exists in Snowflake's CREATE STAGE documentation for S3 Access Points generally, but FSx for ONTAP S3 AP is not listed as a validated target. Our validation confirms that read and governance operations work when configured correctly, but this should not be interpreted as an officially supported configuration by Snowflake. Consult Snowflake Support before production use.
The Goal
Query structured and unstructured data stored on FSx for ONTAP from Snowflake — without copying data to a native S3 bucket. FSx for ONTAP S3 Access Points should make this possible by exposing NFS/SMB file data via S3 API.
In Part 1, Athena worked cleanly. In Part 2, Databricks required the access_point field and still has limitations. This article validates Snowflake's path.
Test Environment
Snowflake Account: Standard edition, AWS ap-northeast-1
Warehouse: COMPUTE_WH (X-Small)
Role: ACCOUNTADMIN
FSx for ONTAP: <FILE_SYSTEM_ID> (ONTAP 9.17.1)
SVM: <SVM_NAME>
S3 Access Point: Internet-origin, UNIX file system user
Scope: This article validates Snowflake Standard edition. Enterprise features (e.g., advanced governance, private connectivity) may provide additional capabilities not tested here.
The Setup
Snowflake accesses external data through a three-layer configuration:
Storage Integration (IAM Role ARN + trust)
│
└── External Stage (S3 URL + AWS_ACCESS_POINT_ARN + file format)
│
└── External Table / SELECT @stage (data access)
Visual Story: Before and After
❌ Before: SELECT Fails Without AWS_ACCESS_POINT_ARN
CREATE OR REPLACE STAGE fsxn_stage_without_arn
STORAGE_INTEGRATION = fsxn_verification_integration
URL = 's3://<ap-alias>/'
FILE_FORMAT = (TYPE = PARQUET);
LIST @fsxn_stage_without_arn/sensor-data/; -- ✅ Works
SELECT $1 FROM @fsxn_stage_without_arn/sensor-data/sensor_data.parquet LIMIT 3; -- ❌ Access Denied
"Failed to access remote file: access denied. Please check your credentials." — The same file that LIST found cannot be read.
✅ After: SELECT Succeeds With AWS_ACCESS_POINT_ARN
CREATE OR REPLACE STAGE fsxn_stage_with_arn
STORAGE_INTEGRATION = fsxn_verification_integration
URL = 's3://<ap-alias>/'
AWS_ACCESS_POINT_ARN = 'arn:aws:s3:<region>:<account>:accesspoint/<ap-name>'
FILE_FORMAT = (TYPE = PARQUET);
SELECT $1 FROM @fsxn_stage_with_arn/sensor-data/sensor_data.parquet LIMIT 3; -- ✅ SUCCESS
Result: 3 rows of sensor data returned successfully.
{"humidity": 32.2, "id": 1, "pressure": 1002.1, "sensor_id": "S004", "status": "normal", "temperature": 21.13}
{"humidity": 45.63, "id": 2, "pressure": 1004.13, "sensor_id": "S005", "status": "normal", "temperature": 23.07}
{"humidity": 42.79, "id": 3, "pressure": 1000.18, "sensor_id": "S003", "status": "normal", "temperature": 36.96}
✅ External Table Also Works
CREATE OR REPLACE EXTERNAL TABLE fsxn_sensor_ext_table
LOCATION = @fsxn_stage_with_arn/sensor-data/
FILE_FORMAT = (TYPE = PARQUET)
AUTO_REFRESH = FALSE;
SELECT * FROM fsxn_sensor_ext_table LIMIT 3; -- ✅ SUCCESS (3 rows)
Complete Capability Matrix
| Capability | Status | Notes |
|---|---|---|
| Read operations | ||
SELECT from @stage (Parquet) |
✅ Verified | GetObject with AWS_ACCESS_POINT_ARN
|
SELECT from @stage (CSV) |
✅ Verified | CSV with SKIP_HEADER works |
SELECT from @stage (JSON) |
✅ Expected | Same GetObject path (no JSON files in test data) |
| External Table (read) | ✅ Verified | CREATE + SELECT both succeed |
LIST @stage (all prefixes) |
✅ Verified | Subdirectories included |
| GET_PRESIGNED_URL | ✅ Observed | Works but not officially supported |
| Load operations | ||
| COPY INTO (stage → table) | ✅ Verified | 4.9s for Parquet load |
| Governance | ||
| Governance Tags on External Table | ✅ Verified | CREATE TAG + ALTER TABLE SET TAG |
| SYSTEM$GET_TAG | ✅ Verified | Tag retrieval works |
| Row Access Policy | ✅ Expected | Standard Snowflake feature on tables |
| Column Masking | ✅ Expected | Standard Snowflake feature on tables |
| Write operations | ||
| PutObject (via COPY INTO unload) | ⚠️ TBD | FSx S3 AP supports PutObject ≤5GB |
| Event-driven | ||
| Snowpipe (auto-ingest) | ❌ Not possible | S3 Event Notifications not supported on FSx S3 AP |
| AUTO_REFRESH on External Table | ❌ Not possible | Requires S3 Event Notifications |
| Transactional table formats | ||
| Iceberg Table read (pre-existing metadata) | ⚠️ TBD | Requires separate validation |
| Iceberg Table write-back | ❌ Not suitable | Conditional writes not supported on FSx S3 AP |
| Delta / Hudi write | ❌ Not suitable | Conditional writes not supported |
| Supported file formats | ||
| Parquet | ✅ Verified | Primary format for analytics |
| CSV | ✅ Verified | With header skip, delimiter options |
| JSON | ✅ Expected | Same read path as Parquet/CSV |
| Avro | ✅ Expected | Snowflake-supported format, same read path |
| ORC | ✅ Expected | Snowflake-supported format, same read path |
Key insight: With
AWS_ACCESS_POINT_ARN, Snowflake achieves broad read and governance integration for the tested paths. The only limitations are event-driven features (Snowpipe, AUTO_REFRESH) and transactional write formats (Iceberg, Delta) — both due to FSx S3 AP API limitations, not Snowflake limitations.
The Root Cause: Session Policy ARN Mismatch
When Snowflake performs sts:AssumeRole, it applies a session policy. Without AWS_ACCESS_POINT_ARN, this session policy uses standard S3 bucket ARN patterns that don't match the FSx S3 AP regional ARN format:
Without AWS_ACCESS_POINT_ARN:
Session policy allows GetObject on: arn:aws:s3:::*/*
FSx S3 AP actual ARN: arn:aws:s3:<region>:<account>:accesspoint/<name>/object/*
→ NO MATCH → AccessDenied
With AWS_ACCESS_POINT_ARN:
Session policy includes: arn:aws:s3:<region>:<account>:accesspoint/<name>/*
→ MATCH → GetObject succeeds
This is the same pattern as Databricks Unity Catalog's access_point field — both platforms need the S3 AP ARN explicitly specified to include it in the generated session policy.
Evidence Matrix
| Layer | Evidence | Result | Interpretation |
|---|---|---|---|
| Snowflake integration | DESCRIBE INTEGRATION | ✅ Pass | Trust established |
| Stage metadata | LIST @stage
|
✅ Pass | ListBucket path works (bucket-level ARN matches) |
| Object read (no ARN) | SELECT @stage
|
❌ Fail | GetObject blocked by session policy |
| Object read (with ARN) | SELECT @stage
|
✅ Pass |
AWS_ACCESS_POINT_ARN resolves session policy |
| External Table | CREATE + SELECT | ✅ Pass | Governed table access works with ARN |
| Same role direct | AWS CLI List/Get/Head | ✅ Pass | IAM/AP/FSx permissions are correct |
| FSx authorization | File system user permissions | ✅ Pass | FSx-side permission permits access |
| Operational health | SVM DNS check | ✅ Pass | Distinguish ReadTimeout from AccessDenied |
FSx for ONTAP S3 AP Authorization Path
FSx for ONTAP S3 Access Points use a dual-layer authorization model:
Layer 1 — S3-side authorization:
- IAM identity-based policy (Snowflake's assumed role session)
- S3 Access Point resource policy
- Session policy generated by Snowflake (requires
AWS_ACCESS_POINT_ARNto include AP ARN)
Layer 2 — FSx for ONTAP-side authorization:
- File system user associated with the access point
- UNIX mode-bits / NFSv4 ACLs (for UNIX security style volumes)
In the Snowflake validation, the initial failure occurred at Layer 1 — Snowflake's generated session policy did not include the S3 AP ARN pattern. Setting AWS_ACCESS_POINT_ARN resolves this by instructing Snowflake to include the AP ARN in the session policy, allowing both layers to be evaluated normally.
S3 API Compatibility and Snowflake Operations
| Snowflake operation | Likely S3 operation | FSx S3 AP support | Observed result (with ARN) |
|---|---|---|---|
LIST @stage
|
ListObjectsV2 | ✅ Supported | ✅ Success |
SELECT @stage
|
GetObject / HeadObject | ✅ Supported | ✅ Success |
| GET_PRESIGNED_URL | Presign / signed GetObject URL | Presign not supported in FSx S3 AP docs | Observed working; not a supported production path |
| External Table read | GetObject | ✅ Supported | ✅ Success |
| Iceberg metadata read | Head/Get + conditional | Partial (conditional writes not supported) | TBD |
Comparison: Snowflake vs Databricks
| Aspect | Snowflake | Databricks |
|---|---|---|
| Parameter name |
AWS_ACCESS_POINT_ARN (on stage) |
access_point (on External Location) |
| LIST without parameter | ✅ Works | ❌ Blocked (before access_point) |
| SELECT without parameter | ❌ Fails | ❌ Fails |
| SELECT with parameter | ✅ Works | ✅ Works (explicit path only) |
| External Table / UC Table | ✅ Works | ❌ CREATE TABLE still fails |
| Subdirectory listing | ✅ Works | ❌ Blocked |
| Documentation | CREATE STAGE docs | Databricks Support (May 2026) |
Key difference: Snowflake's AWS_ACCESS_POINT_ARN resolves the issue more completely than Databricks' access_point field. Snowflake achieves full External Table support, while Databricks still cannot create UC tables.
Partner Decision Card
| Customer requirement | Snowflake + FSx S3 AP today | Recommended path |
|---|---|---|
| File discovery only | ✅ Works (LIST / Directory Table) | Use directly |
| Query file contents in Snowflake | ✅ Works with AWS_ACCESS_POINT_ARN
|
Configure stage with ARN |
| Governed Snowflake external tables | ✅ Works with AWS_ACCESS_POINT_ARN
|
Configure stage with ARN |
| Zero-copy SQL on NAS data | ✅ Snowflake or Athena | Both work; choose by workload |
| Snowflake ML / Snowpark on NAS data | ✅ Possible via External Table | Configure stage with ARN, validate Snowpark path |
| Iceberg Table on FSx S3 AP | TBD (conditional writes not supported) | Validate separately |
Choose Snowflake when governed external tables, tags, Directory Tables, or Snowpark integration are required. Choose Athena when lightweight AWS-native serverless SQL over NAS data is sufficient.
Discovery Questions for Partners
When a customer asks about Snowflake + FSx for ONTAP S3 Access Points:
- Is the workload read-only analytics, or does it require write-back?
- Is Snowflake governance (tags, row access policy, masking) required?
- Does the workload need real-time file detection (Snowpipe), or is scheduled refresh acceptable?
- Are the target files structured (Parquet/CSV/JSON) or unstructured (images/documents)?
- Is the data regulated (PHI, PII, financial)? If so, review presigned URL governance.
- Does the customer need Iceberg table format? (Write-back not supported on FSx S3 AP)
- What is the expected file count and average file size? (Impacts LIST/REFRESH latency)
- Is the Snowflake account in the same AWS region as FSx for ONTAP?
Governance Impact
| Capability | Status | Governance impact |
|---|---|---|
LIST @stage
|
✅ Works | File inventory; not data access governance |
SELECT @stage
|
✅ Works (with ARN) | Query-level access via Snowflake governance |
| External Table | ✅ Works (with ARN) | Governed schema/table abstraction available |
| Iceberg Table | ❌ Write not suitable | Conditional writes not supported; read of pre-existing tables TBD |
| GET_PRESIGNED_URL | ⚠️ Observed only | Risk of bypassing Snowflake query governance if misused |
For regulated workloads, do not use
GET_PRESIGNED_URLas a workaround for query access. Even if URL generation is observed to work, it is not a governed Snowflake query path and should be reviewed separately for auditability, expiration, data classification, and access logging.
Governance Impact Summary
Important premise: FSx for ONTAP S3 Access Points are NOT officially documented by Snowflake as a supported External Stage storage backend. The governance paths described below are validated in this environment but should not be treated as officially supported configurations without Snowflake Support confirmation.
| Access path | Governance model | Auditability | Production suitability |
|---|---|---|---|
External Table (with AWS_ACCESS_POINT_ARN) |
Snowflake RBAC + Tags + Row Access Policy | High (Snowflake Access History, query logs) | Recommended governed read path |
| COPY INTO (load to Snowflake table) | Full Snowflake governance on loaded data | High (standard Snowflake table governance) | Recommended for ML/AI workloads requiring full governance |
| Directory Table + GET_PRESIGNED_URL | File catalog governed; URL access is external | Medium (catalog queries logged; URL access not logged by Snowflake) | File discovery governed; downstream access requires separate audit |
| BUILD_SCOPED_FILE_URL | Snowflake-mediated access | High (access mediated through Snowflake privileges) | Preferred for governed unstructured data access |
| GET_PRESIGNED_URL (direct) | External access path | Low (Snowflake does not log URL usage after generation) | PoC / non-regulated only; requires separate access logging |
Snowflake Access History captures query-level access to External Tables. However, presigned URL usage after generation is not tracked by Snowflake — use CloudTrail S3 data events for downstream audit if required.
MLOps Boundary
Reading data from FSx for ONTAP S3 AP via Snowflake External Table does not automatically make the downstream ML workflow governed.
If the data accessed via External Table or COPY INTO is used for ML or GenAI:
- Register derived datasets in governed Snowflake tables
- Track experiments with Snowflake ML lineage or external experiment tracking
- Document source data access path (stage name, S3 AP alias, prefix, timestamp)
- Record whether training data lineage is captured within Snowflake or externalized
- Ensure Snowpark ML workloads use appropriate role privileges
- If using Cortex functions, validate that input data classification is appropriate for the model
Snowflake's ML Lineage tracks feature-to-model relationships. If the source data path is an External Table on FSx S3 AP, document this as the lineage origin.
AI / RAG Data Readiness Checklist
If the FSx for ONTAP S3 AP data is intended for AI, RAG, or GenAI pipelines via Snowflake:
- [ ] Are documents classified by sensitivity (PHI, PII, financial, internal, public)?
- [ ] Are file-level permissions preserved or re-modeled for the AI pipeline?
- [ ] Is metadata available for filtering and retrieval (file type, date, owner)? → Use Directory Table
- [ ] Is freshness requirement defined (real-time, daily, weekly)? → Define REFRESH schedule
- [ ] Is read-only access sufficient, or does the pipeline need write-back?
- [ ] Is human review required for generated output before downstream use?
- [ ] Is permission-aware retrieval required (user A sees only their authorized documents)?
If permission-aware retrieval is required, define one of:
- Enforce at source access path — use per-user or per-group S3 Access Points with scoped file system users
- Re-model permissions in metadata index — extract file-level ACLs into Directory Table metadata and filter at query time
- Filter retrieval results by user/group claims — apply Snowflake Row Access Policy on External Table based on authenticated user identity
- Do not proceed until authorization model is validated and approved by security owner
Snowflake + FSx S3 AP approval requirements (for regulated workloads):
- Data owner approval for External Table / stage access
- Security owner approval for presigned URL generation policy
- Platform owner approval for COPY INTO (data leaves FSx, enters Snowflake)
- Defined: allowed prefix, allowed operations, refresh schedule, expiration date
- Approval record location (where the decision is stored)
- Review / expiration date (when the approval must be re-evaluated)
For regulated workloads, exercise caution with:
- GET_PRESIGNED_URL for patient-facing or financial data (bypasses Snowflake query governance)
- COPY INTO without data classification review (data moves from FSx to Snowflake storage)
- Cortex LLM functions on sensitive data without human review gate
- Unreviewed access to regulated datasets via scoped URLs
Unstructured Data Support
| Format | Support | Access Method | Use Case |
|---|---|---|---|
| Images (JPEG, PNG, TIFF) | ✅ | GET_PRESIGNED_URL / BUILD_SCOPED_FILE_URL | Thumbnail generation, ML inference, quality inspection |
| Video (MP4, MOV) | ✅ | GET_PRESIGNED_URL | Streaming, frame extraction |
| Documents (PDF, DOCX) | ✅ | GET_PRESIGNED_URL / Snowpark File Access | Text extraction, RAG, document processing |
| Audio (WAV, MP3) | ✅ | GET_PRESIGNED_URL | Transcription, speech analytics |
| Binary / Archives | ✅ | GET_PRESIGNED_URL | Download, transfer |
How to manage unstructured data as a library:
-- Enable Directory Table for file catalog
ALTER STAGE fsxn_stage SET DIRECTORY = (ENABLE = TRUE);
ALTER STAGE fsxn_stage REFRESH;
-- Query file catalog (search by path, size, date)
SELECT RELATIVE_PATH, SIZE, LAST_MODIFIED
FROM DIRECTORY(@fsxn_stage)
WHERE RELATIVE_PATH LIKE '%images/%'
ORDER BY LAST_MODIFIED DESC;
-- Generate download URL for applications (valid 1 hour)
SELECT GET_PRESIGNED_URL(@fsxn_stage, 'images/photo001.jpg', 3600);
-- Generate Snowflake-proxied secure URL
SELECT BUILD_SCOPED_FILE_URL(@fsxn_stage, 'documents/report.pdf');
Note: AUTO_REFRESH is not available because FSx S3 AP does not support S3 Event Notifications (GetBucketNotificationConfiguration is not supported). Use
ALTER STAGE REFRESHmanually or via Snowflake Task on a schedule.URL type guidance: Use
BUILD_SCOPED_FILE_URLwhen you want access mediated through Snowflake role privileges (governed path). TreatGET_PRESIGNED_URLas an external object access path that bypasses Snowflake query governance and requires separate review for regulated workloads.
AI / ML Integration Path
Snowflake provides AI/ML capabilities that can leverage FSx for ONTAP data via S3 AP. 7 out of 9 tested Cortex AI functions work directly on FSx S3 AP data without copying.
| Snowflake AI/ML Feature | FSx S3 AP Compatibility | Access Path | Duration | Use Case |
|---|---|---|---|---|
| CORTEX.SUMMARIZE | ✅ Direct | External Table → Cortex | 3.3s | Text summarization on NAS documents |
| CORTEX.TRANSLATE | ✅ Direct | External Table → Cortex | 5.1s | Multi-language support |
| CORTEX.SENTIMENT | ✅ Direct | External Table → Cortex | 2.5s | Sentiment analysis |
| CORTEX.COMPLETE (text) | ✅ Direct | External Table → Cortex | 16s | AI analysis, anomaly detection |
| CORTEX.EXTRACT_ANSWER | ✅ Direct | External Table → Cortex | 2.7s | Information extraction |
| PARSE_DOCUMENT (OCR) | ✅ Direct | Stage path → OCR | ~8s | Invoice/report text extraction |
| COMPLETE (Vision/Multimodal) | ✅ Workaround | COPY FILES → internal stage → TO_FILE | 41s | Image analysis, defect detection |
| TO_FILE on FSx S3 AP | ❌ Blocked | — | — | "Remote file not found" |
| Cortex Search (RAG) | ✅ Verified | External Table → COPY INTO → Cortex Search Service | 198ms query | Semantic search over NAS documents |
Key finding: Text-based Cortex functions, PARSE_DOCUMENT, and Cortex Search all work on FSx S3 AP data (Cortex Search requires COPY INTO as a staging step). Vision AI (multimodal COMPLETE) requires a staging step because TO_FILE() cannot resolve files on S3 AP external stages.
Validated AI/ML paths:
- ✅ Cortex LLM SUMMARIZE on External Table data — AI-generated summary in 3.3s
- ✅ Cortex TRANSLATE on External Table data — English to Japanese in 5.1s
- ✅ Cortex SENTIMENT on External Table data — sentiment scores in 2.5s
- ✅ Cortex COMPLETE (text) on External Table data — AI anomaly analysis in 16s
- ✅ Cortex EXTRACT_ANSWER on External Table data — information extraction in 2.7s
- ✅ PARSE_DOCUMENT (OCR) on FSx S3 AP stage file — text extraction from images in ~8s
- ✅ COMPLETE (Vision AI) via COPY FILES workaround — image analysis in 41s (pixtral-large)
- ✅ Cortex Search (RAG) — External Table → COPY INTO → Cortex Search Service → semantic query in 198ms
- ✅ COPY INTO loads NAS data into Snowflake tables → available for all Cortex/ML functions
- ✅ Directory Table catalogs unstructured files → enables file discovery for processing pipelines
- ✅ GET_PRESIGNED_URL generates download URLs → enables external ML services to access files
Vision AI Workaround (Validated)
Direct TO_FILE() on FSx S3 AP external stage returns "Remote file not found." The workaround:
-- 1. Create unencrypted internal stage (SNOWFLAKE_SSE required — default encryption blocks TO_FILE)
CREATE OR REPLACE STAGE fsxn_ai_stage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
-- 2. Copy image from FSx S3 AP to internal stage
COPY FILES INTO @fsxn_ai_stage FROM @fsxn_ap_arn_test_stage/media/documents/invoice_sample.png;
-- 3. Enable Cross-Region Inference (required for vision models in ap-northeast-1)
ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION';
-- 4. Run Vision AI
ALTER STAGE fsxn_ai_stage SET DIRECTORY = (ENABLE = TRUE);
ALTER STAGE fsxn_ai_stage REFRESH;
SELECT SNOWFLAKE.CORTEX.COMPLETE('pixtral-large',
'Describe this invoice. What is the invoice number, customer, and amount?', FILE
) AS vision_result
FROM (SELECT TO_FILE(BUILD_SCOPED_FILE_URL(@fsxn_ai_stage, RELATIVE_PATH)) AS FILE
FROM DIRECTORY(@fsxn_ai_stage) WHERE RELATIVE_PATH LIKE '%.png' LIMIT 1);
Result: Vision AI correctly identified Invoice #INV-2026-0524, Customer: Acme Corp, Amount: USD 1,234.56.
Data residency note: The COPY FILES step moves image data from FSx for ONTAP to Snowflake-managed internal storage. Cross-Region Inference may route data to US/EU regions for model processing. Verify compliance with your data residency requirements before enabling for regulated workloads.
Cortex Search (RAG) — Validated
Cortex Search provides semantic search over text data — the Snowflake-native RAG building block. The validated path uses External Table → COPY INTO → Cortex Search Service:
-- 1. Load FSx S3 AP data into internal table (required for Cortex Search)
COPY INTO sensor_documents FROM @fsxn_stage_with_arn/sensor-data/
FILE_FORMAT = (TYPE = PARQUET);
-- 2. Create Cortex Search Service on the loaded data
CREATE OR REPLACE CORTEX SEARCH SERVICE sensor_search_service
ON text_column
WAREHOUSE = COMPUTE_WH
TARGET_LAG = '1 hour'
AS (SELECT * FROM sensor_documents);
-- 3. Semantic search query
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'sensor_search_service',
'{"query": "high temperature anomaly", "columns": ["text_column"], "limit": 5}'
)
);
-- Result: Relevant documents returned in 198ms
Dataset context: This validation used the sensor data loaded via COPY INTO from FSx S3 AP (1000 rows of IoT sensor readings). Cortex Search performance at scale (millions of documents, large text corpora) should be validated separately — 198ms is a sizing reference for this dataset size, not a service-level guarantee.
GA status: Verify that Cortex Search Service and its query functions are Generally Available (GA) in your Snowflake edition and region before production use. Preview features may not be covered by Snowflake SLA and should not be used for regulated workloads without explicit vendor confirmation.
Cortex Search Service created on data loaded from FSx for ONTAP via COPY INTO.
Semantic search query returns relevant results in 198ms — RAG-style retrieval over NAS-originated data.
Key insight: Cortex Search requires COPY INTO (data must be in a Snowflake internal table), but the end-to-end path from FSx for ONTAP → External Stage → COPY INTO → Cortex Search Service → semantic query is validated. This provides a Snowflake-native RAG path for NAS documents.
Data residency change: COPY INTO moves data from FSx for ONTAP to Snowflake-managed storage. Once loaded, the data is subject to Snowflake's storage lifecycle, not ONTAP's. For regulated workloads, obtain data owner approval before COPY INTO and document the residency change in your compliance records. Cortex Search Service indexes are stored in the same region as the Snowflake account — no cross-region data movement occurs for the index itself.
Comparison with Bedrock Knowledge Bases: Cortex Search requires a COPY INTO step (data moves to Snowflake storage). Bedrock Knowledge Bases can read directly from FSx S3 AP without copying. Choose Cortex Search when the RAG pipeline must stay within Snowflake governance. Choose Bedrock KB when data residency on FSx is mandatory and AWS-native RAG is preferred.
PoC Quick Start — Validate Cortex Search on your NAS data in 3 steps (estimated: 30 minutes with pre-configured stage):
- Configure External Stage with
AWS_ACCESS_POINT_ARN(see Configuration Guide above) - Run
COPY INTO <target_table> FROM @fsxn_stage/<your-documents-prefix>/to load text data - Create Cortex Search Service on the loaded table and run a semantic query to validate retrieval quality
Manufacturing Use Case: OCR + AI on NAS Data
-- OCR: Extract text from inspection report image stored on FSx for ONTAP
SELECT SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
@fsxn_stage,
'media/documents/invoice_sample.png',
{'mode': 'OCR'}
) AS ocr_result;
-- Result: "INVOICE #INV-2026-0524", "Customer: Acme Corp", "Amount: USD 1,234.56"
-- AI Analysis: Analyze sensor data for anomalies
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large2',
'Analyze this IoT sensor reading and identify anomalies: ' || VALUE::VARCHAR
) AS ai_analysis FROM fsxn_sensor_ext_table LIMIT 1;
PARSE_DOCUMENT (OCR mode) extracts text from an image on FSx for ONTAP via S3 AP — works directly without copying.
Cortex COMPLETE (mistral-large2) generates AI anomaly analysis of IoT sensor data on FSx for ONTAP — works directly on External Table data.
Vision AI (pixtral-large) correctly extracts invoice details from an image originally on FSx for ONTAP — requires COPY FILES to internal stage.
Not validated in this article:
- Snowpark File Access (
SnowflakeFile.open()) for direct binary file processing in UDFs - AI_TRANSCRIBE for audio files on FSx S3 AP
Comparison with Databricks AI/ML path:
| AI/ML Capability | Snowflake + FSx S3 AP | Databricks + FSx S3 AP |
|---|---|---|
| Governed table as ML input | ✅ External Table | ❌ UC Table creation blocked |
| Text AI (LLM) on NAS data | ✅ 6 Cortex functions direct | ⚠️ boto3 + external LLM (bypasses UC) |
| Vision AI on NAS images | ✅ Via staging workaround (41s) | ⚠️ boto3 driver-only (bypasses UC) |
| OCR / Document extraction | ✅ PARSE_DOCUMENT direct (8s) | ⚠️ boto3 + external OCR |
| Feature engineering | ✅ Snowpark DataFrame on External Table | ⚠️ spark.read with explicit path only |
| File catalog for ML pipeline | ✅ Directory Table | ⚠️ dbutils.fs.ls (top-level only) |
| RAG over NAS documents | ✅ Cortex Search (via COPY INTO, 198ms) | ⚠️ boto3 + external RAG (bypasses UC) |
Key insight: Snowflake's AI/ML path benefits from governed External Tables and direct Cortex function access — 8 out of 10 tested functions work on FSx data (7 directly without copying, 1 via COPY INTO for Cortex Search). Databricks' AI/ML path is limited by UC table creation failure, forcing boto3 workarounds that bypass governance.
For end-to-end RAG on NAS documents: Use Snowflake Cortex Search (validated: External Table → COPY INTO → Cortex Search Service, 198ms query latency) or Amazon Bedrock Knowledge Bases as the AWS-documented path (no copy needed).
Decision guidance: Use Snowflake when the customer already needs Snowflake governance, Cortex/Snowpark processing, or table-based feature engineering. Use Bedrock Knowledge Bases when the primary requirement is AWS-native permission-aware RAG over NAS documents.
Comparison: Snowflake vs Databricks (Governance)
| Governance Capability | Snowflake + FSx S3 AP | Databricks + FSx S3 AP |
|---|---|---|
| Table creation | ✅ External Table | ❌ CREATE TABLE fails |
| Data classification tags | ✅ Governance Tags | ❌ UC Table not creatable |
| Access control | ✅ Row Access Policy | ❌ UC governance not applicable |
| File catalog | ✅ Directory Table | ⚠️ dbutils.fs.ls (top-level only) |
| Secure URL generation | ✅ BUILD_SCOPED_FILE_URL | ❌ |
| Column masking | ✅ Available | ❌ |
| COPY INTO (data load) | ✅ | ❌ |
| Unstructured data catalog | ✅ Directory Table + Presigned URL | ⚠️ boto3 only (bypasses governance) |
Key takeaway: In this validation, Snowflake with AWS_ACCESS_POINT_ARN achieved a more complete governed read path than the Databricks path tested in Part 2. Snowflake can create governed tables, apply tags, and manage unstructured data catalogs — capabilities that remain blocked in Databricks due to UC table creation failure.
For regulated workloads: Snowflake provides a more complete governed path today (External Table + Tags + Row Access Policy + audit trail). Databricks requires staged ingestion to S3 for equivalent governance. If your compliance framework requires governed table-level access control on the data, Snowflake is the validated path for FSx S3 AP integration.
Business Impact
| Requirement | Observed result | Business impact | Recommended decision |
|---|---|---|---|
| Zero-copy Snowflake query over NAS | ✅ Works (with ARN) | Eliminates copy pipeline | Use AWS_ACCESS_POINT_ARN stage |
| Snowflake governance on FSx data | ✅ External Table works | Governed table abstraction available | Create External Tables |
| File inventory from Snowflake | ✅ Works | Metadata cataloging possible | Use LIST / Directory Tables |
| RAG / AI over NAS documents | ✅ Cortex Search validated (198ms) | Snowflake-native RAG path available | COPY INTO → Cortex Search Service |
| Text AI on NAS data (no copy) | ✅ 7 functions direct | AI processing without data movement | Use Cortex functions on External Table |
Detailed validation metrics (refresh duration, file count, query latency, COPY INTO duration, URL generation success rate) should be recorded in the verification-pack evidence files rather than treated as universal benchmark numbers.
Use Case Fit Matrix
| Use case | Best current path | Why |
|---|---|---|
| SQL analytics on structured NAS files | Snowflake External Table or Athena | Both validated; Snowflake adds governance tags |
| Unstructured data catalog | Snowflake Directory Table | File metadata queryable with governance |
| Data load from NAS to Snowflake | COPY INTO from FSx S3 AP stage | Validated (4.9s for Parquet) |
| RAG over NAS documents | Cortex Search (via COPY INTO, validated 198ms) or Bedrock KB (AWS-native) | Cortex Search validated; Bedrock KB is AWS-documented path |
| ML feature engineering | Snowpark DataFrame on External Table | Governed read path available |
| Real-time ingestion | Not FSx S3 AP path | Use native S3 + Snowpipe |
| Iceberg / transactional tables | Not FSx S3 AP path | Use native S3 for write-back |
Cost Model Considerations
| Component | Cost driver | Notes |
|---|---|---|
| Snowflake warehouse | Credit consumption during queries | X-Small sufficient for validation; scale per workload |
| FSx for ONTAP | Throughput capacity + storage | S3 AP queries share throughput with NFS/SMB workloads |
| S3 AP requests | No additional S3 request charges | FSx S3 AP does not incur separate S3 API fees |
| Data transfer | Standard AWS data transfer | Snowflake SaaS in same region minimizes transfer |
Cost comparison across engines is not the focus of this article. Snowflake's credit-based model differs fundamentally from Athena's per-TB-scanned model. Evaluate based on workload pattern, governance requirements, and existing Snowflake investment.
Configuration Guide
Step 1: Create Storage Integration
CREATE OR REPLACE STORAGE INTEGRATION fsxn_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<account>:role/<role-name>'
STORAGE_ALLOWED_LOCATIONS = ('s3://<ap-alias>/');
Step 2: Create Stage WITH AWS_ACCESS_POINT_ARN
CREATE OR REPLACE STAGE fsxn_stage
STORAGE_INTEGRATION = fsxn_integration
URL = 's3://<ap-alias>/'
AWS_ACCESS_POINT_ARN = 'arn:aws:s3:<region>:<account>:accesspoint/<ap-name>'
FILE_FORMAT = (TYPE = PARQUET);
Step 3: Verify
LIST @fsxn_stage/; -- File discovery
SELECT $1 FROM @fsxn_stage/path/to/file.parquet LIMIT 5; -- Data read
Step 4: Create External Table (optional)
The following DDL is simplified for readability. See the GitHub SQL scripts for the exact tested definition.
CREATE OR REPLACE EXTERNAL TABLE my_ext_table
LOCATION = @fsxn_stage/sensor-data/
FILE_FORMAT = (TYPE = PARQUET)
AUTO_REFRESH = FALSE;
Internal Table vs External Table — Design Guide
Understanding the difference between internal (managed) tables and external tables is critical for architecture decisions when integrating FSx for ONTAP with Snowflake.
Comparison Matrix
| Aspect | External Table (on FSx S3 AP) | Internal Table (COPY INTO) |
|---|---|---|
| Data location | Remains on FSx for ONTAP (zero-copy) | Copied into Snowflake-managed storage |
| Multi-protocol access | Same data via NFS/SMB/S3 AP simultaneously | Only accessible via Snowflake |
| Data freshness | Real-time (reads current file state) | Stale until next COPY INTO |
| Query performance | Slower (estimated ~2-5s for small queries based on observed S3 AP GetObject latency) | Faster (sub-second with micro-partitions, pruning) |
| Governance (Tags, Masking) | ✅ Full support | ✅ Full support |
| Time Travel | ❌ Not available | ✅ Available (up to 90 days) |
| Cortex AI (text functions) | ✅ Direct (SUMMARIZE, TRANSLATE, etc.) | ✅ Direct |
| Cortex AI (Vision/TO_FILE) | ❌ TO_FILE blocked on FSx S3 AP | ✅ Works on internal stage |
| Cortex Search (RAG) | ❌ Requires COPY INTO first | ✅ Direct |
| ONTAP features preserved | ✅ Snapshot, FlexClone, Dedup, FPolicy | ❌ Data is outside ONTAP |
| Storage cost | FSx for ONTAP only (no Snowflake storage) | FSx + Snowflake storage (duplicate) |
Decision Flowchart
Q: Does the data need to stay on FSx for ONTAP?
├── YES → External Table
│ Q: Do you need Vision AI or Cortex Search?
│ ├── YES → Hybrid: External Table + selective COPY INTO
│ └── NO → External Table is sufficient (text AI works directly)
│
└── NO → COPY INTO internal table
Q: Do you need real-time freshness?
├── YES → Scheduled COPY INTO (Task) or FPolicy → Lambda → Snowpipe
└── NO → Batch COPY INTO on schedule
Cost Comparison
| Pattern | FSx Storage | Snowflake Storage | Best For |
|---|---|---|---|
| External Table only | ✅ (existing) | None | Read-heavy, compliance, multi-protocol |
| COPY INTO (full) | ✅ (existing) | + full copy | Max performance, Time Travel, full AI |
| Hybrid (External + selective COPY) | ✅ (existing) | + images/RAG data only | AI workloads with data residency needs |
Industry-Specific Recommendations
| Industry | Recommended Pattern | Rationale | PoC Success Criteria |
|---|---|---|---|
| Manufacturing | External Table + PARSE_DOCUMENT (OCR) | Data stays on FSx; inspection images processed in place | OCR extracts text from 10+ inspection images in <10s each |
| Financial Services | Hybrid (External Table + COPY INTO for Cortex Search) | Compliance requires data on FSx; RAG needs internal table | Cortex Search returns relevant compliance docs in <500ms |
| Healthcare | External Table + SnapLock | PHI must not leave controlled storage; immutable audit | SELECT on External Table succeeds with governance tags applied |
| Media / Entertainment | External Table + COPY FILES (Vision AI) | Large media files stay on FSx; selective staging for AI | Vision AI describes image content correctly via staging path |
| Cross-Industry Analytics | COPY INTO (full) | Maximum query performance; data duplication acceptable | COPY INTO completes in <10s for representative dataset |
Snowpipe Alternatives for FSx for ONTAP
Since FSx S3 AP does not support S3 Event Notifications, standard Snowpipe auto-ingest is not available. Use these alternatives:
Option 1: FPolicy → Lambda → SNS → Snowpipe REST API (Recommended)
FSx for ONTAP ──FPolicy──▶ Lambda ──▶ SNS ──▶ Snowpipe REST API ──▶ COPY INTO target table
│ │
└── NFS/SMB users access same data └── Snowflake governance on loaded data
- Latency: Seconds (<30s from file write to Snowflake availability)
- Complexity: Medium (requires FPolicy configuration + Lambda function)
- Best for: Near-real-time ingestion requirements
FPolicy throughput note: FPolicy introduces minimal latency on the NFS/SMB I/O path (typically <1ms per operation for passthrough mode). However, under high-frequency file write workloads (thousands of files/second), validate throughput impact on the FSx for ONTAP file system before production deployment.
Option 2: Snowflake Task + COPY INTO (Simple)
-- Create a task that runs COPY INTO every 5 minutes
CREATE OR REPLACE TASK fsxn_ingest_task
WAREHOUSE = COMPUTE_WH
SCHEDULE = '5 MINUTE'
AS
COPY INTO target_table FROM @fsxn_stage_with_arn/incoming/
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*[.]parquet';
ALTER TASK fsxn_ingest_task RESUME;
- Latency: Minutes (configurable schedule interval)
- Complexity: Low (pure Snowflake SQL)
- Best for: Batch ingestion where minutes-level latency is acceptable
Option 3: Snowpipe REST API (Manual Trigger)
Applications call the Snowpipe REST API with a file list when new files are known:
- Latency: Seconds (triggered by application)
- Complexity: Low (API call from any application)
- Best for: Application-controlled ingestion workflows
Snowpipe / COPY INTO Supported Formats
| Format | Snowpipe | COPY INTO | External Table | Notes |
|---|---|---|---|---|
| CSV | ✅ | ✅ | ✅ | Delimiter, header, encoding options |
| JSON | ✅ | ✅ | ✅ | Nested, semi-structured |
| Parquet | ✅ | ✅ | ✅ | Column pruning, predicate pushdown |
| Avro | ✅ | ✅ | ✅ | Schema evolution supported |
| ORC | ✅ | ✅ | ✅ | Read-only |
| XML | ✅ | ✅ | ✅ | Native support |
Stop Criteria
Stop the Snowflake direct-access PoC when:
- SELECT from stage fails with AccessDenied after
AWS_ACCESS_POINT_ARNis configured and IAM/AP/FSx permissions are proven correct - The workload requires Iceberg Table write-back (conditional writes not supported on FSx S3 AP)
- Data owner does not approve the access path
- ReadTimeout occurs (check SVM DNS/AD configuration — see Networking Troubleshooting)
Regulated Workload Checklist
Before using Snowflake + FSx S3 AP for regulated data:
- [ ] Confirm the S3 Access Point file-system user identity and least-privilege permissions
- [ ] Confirm Snowflake role privileges for stage, external table, and tag access
- [ ] Define whether users may generate presigned or scoped URLs (prefer
BUILD_SCOPED_FILE_URLfor governed access) - [ ] Record derived data locations if COPY INTO loads data into Snowflake tables
- [ ] Define manual refresh schedule and evidence retention
- [ ] Store approval owner, review date, and expiration date
- [ ] Validate that
GET_PRESIGNED_URLis not used as a bypass for query-level governance - [ ] If Vision AI is required: Approve COPY FILES to internal stage (data moves to Snowflake-managed storage)
- [ ] If Cross-Region Inference is enabled: Verify that image/document data may be processed in US/EU regions
- [ ] If Cortex Search is used: Approve COPY INTO (data moves to Snowflake storage) AND Cortex Search Service index creation (data residency changes twice — once for table load, once for search index). Cortex Search Service index is stored in the Snowflake account region.
Store the checklist result with an approval ID, owner, review date, expiration date, and evidence location so the PoC decision can be audited later.
Cross-Region Inference — Data Residency Warning
When CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION' is set, Cortex AI functions may route data to model endpoints in other AWS regions (US, EU) for processing. For regulated workloads:
- Verify: Does your compliance framework allow data processing outside the home region?
-
Alternatives: Use
AWS_USorAWS_EUinstead ofANY_REGIONto limit routing scope - Mitigation: Process only non-regulated images via Vision AI; keep PHI/PII in text-only Cortex functions (which run in-region)
- Documentation: Record which Cross-Region setting is used and which data types are processed
Compliance Framework Mapping
| Framework | Recommended Pattern | Key Controls |
|---|---|---|
| HIPAA (PHI) | External Table + SnapLock + FPolicy audit | Data never leaves FSx; file access audited; admin cannot delete during retention |
| SOX (Financial) | COPY INTO + Time Travel + audit trail | Full change history; point-in-time queries for audit |
| GDPR (PII) | External Table + Row Access Policy + Tag-based Masking | Data minimization at query time; PII masked for non-authorized roles |
| FINRA (Records) | External Table + SnapLock Compliance | Non-erasable, non-writable records for retention period |
Approval Evidence Example
approval_id: "FSXN-SF-POC-001"
data_owner: "<name/group>"
security_owner: "<name/group>"
platform_owner: "<name/group>"
allowed_prefixes:
- "s3://<ap-alias>/sensor-data/"
- "s3://<ap-alias>/bronze/"
allowed_operations:
- LIST
- SELECT (External Table)
- COPY INTO (load only)
- Directory Table
- BUILD_SCOPED_FILE_URL
- Cortex text functions (SUMMARIZE, TRANSLATE, SENTIMENT)
- COPY FILES to internal stage (for Vision AI only)
disallowed_operations:
- GET_PRESIGNED_URL for regulated data
- COPY INTO unload (write-back)
- Cortex LLM on PHI/PII without human review
- Cross-Region Inference on regulated images (unless approved)
cross_region_inference: "ANY_REGION" # or "DISABLED" for regulated data
review_date: "<YYYY-MM-DD>"
expiration_date: "<YYYY-MM-DD>"
evidence_location: "verification-pack/snowflake/evidence/<date>/evidence-record.yaml"
COPY INTO unload (write-back to FSx S3 AP) was not validated in this article. Although FSx S3 AP supports PutObject, Snowflake unload behavior should be tested separately before positioning write-back as supported.
Data residency note: COPY INTO (load) and COPY FILES change the data residency model — source files remain on FSx, but a derived copy is created in Snowflake-managed storage. Cross-Region Inference may further route data to other regions. Treat loaded tables and staged files as derived regulated data and apply retention, classification, and deletion controls separately.
Troubleshooting Playbook
When Snowflake access to FSx for ONTAP S3 AP fails, isolate one layer at a time:
-
Stage configuration — Is
AWS_ACCESS_POINT_ARNset? Without it, GetObject will fail. -
IAM — Does the Storage Integration role have
s3:GetObject,s3:ListBucketon the S3 AP ARN? - S3 AP policy — Does the Access Point resource policy allow the Snowflake IAM user ARN?
- FSx file system — Is the file system user (e.g., root) permitted to read the target files?
- Network — Is the AP internet-origin? (Snowflake SaaS cannot use VPC-origin APs)
-
Operational — Does
vserver services dns checkshow healthy DNS? (ReadTimeout = DNS/AD issue)
Known Failure Signatures
| Symptom | Likely layer | Next step |
|---|---|---|
| LIST works, SELECT fails with "access denied" | Missing AWS_ACCESS_POINT_ARN
|
Add ARN parameter to stage |
| LIST and SELECT both fail with "access denied" | IAM role or S3 AP policy | Check DESCRIBE INTEGRATION, verify trust policy |
| ReadTimeout (no response) | SVM DNS/AD or FSx backend | Check vserver services dns check; verify S3 AP lifecycle |
| Stage creation fails | Storage Integration config | Verify STORAGE_ALLOWED_LOCATIONS includes the AP alias |
| External Table creation fails | Stage or file format issue | Verify LIST works first, then check FILE_FORMAT |
| COPY INTO fails | File format mismatch or permissions | Verify SELECT works first |
What This Article Does Not Conclude
This article does not conclude that Snowflake + FSx for ONTAP S3 AP is production-certified for all workloads. It documents the behavior observed in one validated environment and identifies the configuration required for successful integration.
Specifically, this article does not validate:
- Snowpipe auto-ingest (requires S3 Event Notifications)
- Iceberg Table write-back (requires conditional writes)
- COPY INTO unload / write-back to FSx S3 AP
- Snowpark File Access (SnowflakeFile.open) for binary processing
- Performance at scale (large file counts, concurrent queries, large directory refreshes, or mixed NFS/SMB/S3 workload contention on the FSx file system)
- Private connectivity (PrivateLink) path
Operational Note: ReadTimeout vs AccessDenied
During this validation series, all S3 APs on one SVM became unresponsive for 7+ days due to orphaned DNS/AD configuration.
Important distinction:
- ReadTimeout (no response) → Check SVM DNS/AD configuration
-
AccessDenied (immediate error) → Check
AWS_ACCESS_POINT_ARNstage parameter
See FSx S3 AP Networking — DNS/AD Troubleshooting for details.
Lessons Learned
1. Platform documentation holds the answer
The AWS_ACCESS_POINT_ARN parameter exists in Snowflake's CREATE STAGE documentation. The initial "no workaround" conclusion was premature — always check platform docs for S3 AP-specific parameters before concluding incompatibility.
2. The same pattern recurs across platforms
Both Snowflake (AWS_ACCESS_POINT_ARN) and Databricks (access_point field) require explicit S3 AP ARN configuration. This appears to be a recurring integration pattern: platforms that generate restrictive session policies need an explicit parameter so the generated policy includes the regional access point ARN format.
3. LIST ≠ READ (but the fix is simple)
The partial success (LIST works, SELECT doesn't) is confusing but has a clear fix. The root cause is that ListBucket uses bucket-level ARN matching while GetObject requires object-level ARN matching — and the AP ARN parameter resolves both.
4. SVM DNS/AD configuration can silently break S3 AP
ReadTimeout (not AccessDenied) indicates an operational issue, not a session policy issue. Check vserver services dns check on the SVM.
5. Pre-signed URLs work but are not a governed path
GET_PRESIGNED_URL() generates valid URLs for FSx S3 AP objects. However, this bypasses Snowflake query governance and should not be used as a production workaround for regulated workloads.
What to Tell Stakeholders
Current recommendation (8 out of 10 tested AI functions validated on FSx data):
- Use Snowflake External Stage with
AWS_ACCESS_POINT_ARNfor governed read access to FSx for ONTAP data - Use External Tables for governed schema abstraction with tags and access policies
- Use COPY INTO when data needs to be loaded into Snowflake for ML/AI processing
- Use Directory Table for unstructured data cataloging
- Do not rely on Snowpipe AUTO_REFRESH — use scheduled
ALTER STAGE REFRESHinstead - Do not position Iceberg write-back on FSx S3 AP as supported
- For end-to-end RAG, use Cortex Search (validated: External Table → COPY INTO → Cortex Search Service, 198ms query) or Bedrock Knowledge Bases (AWS-documented path, no copy needed)
This validation should be used to guide architecture selection and stage configuration, not as a production certification.
What's Next
- Part 1: Athena — Query NAS Data In Place (validated read-oriented SQL path)
-
Part 2: Databricks — A Layer-by-Layer Validation of Observed Boundaries (session policy +
access_pointfield) - Part 4: DuckDB Lambda — Serverless analytics at $0.00001/query (for teams that need lightweight, zero-idle-cost SQL without warehouse management)
- Part 5: EMR Spark — Read-Write ETL Pipeline (for teams that need distributed Spark processing with write-back to S3 for downstream lakehouse consumption)
References
- Snowflake CREATE STAGE — AWS_ACCESS_POINT_ARN parameter
- FSx for ONTAP S3 Access Points documentation
- FSx S3 AP API compatibility
- FSx S3 AP dual-layer authorization
- GitHub: fsxn-lakehouse-integrations
Key achievement: This validation established that Snowflake + FSx for ONTAP S3 AP provides a governed, AI-ready read path — 8 out of 10 tested Cortex AI functions work on NAS data, External Tables enable full governance (tags, masking, row policies), and Cortex Search delivers 198ms semantic search over NAS-originated documents. This is the most complete governed integration path validated in this series.
This article documents observed behavior in one validated environment (Snowflake Standard edition, AWS ap-northeast-1, May 2026). Platform behavior may change with future updates.
Disclaimer: This article is an independent validation report and does not represent Snowflake, AWS, or NetApp official guidance. Product behavior, support status, and platform capabilities may change. Always validate in your own environment and consult vendor documentation and support channels.






Top comments (0)