DEV Community

Cover image for Snowflake and FSx for ONTAP S3 Access Points — From 'Access Denied' to Working External Tables

Snowflake and FSx for ONTAP S3 Access Points — From 'Access Denied' to Working External Tables

TL;DR

In Part 1, Athena worked cleanly. In Part 2, Databricks hit session policy boundaries. This Part 3 validates Snowflake's path — and it works.

Snowflake can query FSx for ONTAP S3 Access Point data — but only with the correct stage configuration. Without the AWS_ACCESS_POINT_ARN parameter, SELECT fails with "access denied" while LIST works. With it, the tested read, governance, and AI paths work: SELECT, External Tables, COPY INTO load, Directory Tables, governance tags, and 8 out of 10 Cortex AI functions work on FSx data (7 directly, 1 via COPY INTO for Cortex Search).

Configuration LIST SELECT External Table Cortex AI (text) Vision AI
AP alias only (no ARN) ❌ Access Denied
AP alias + AWS_ACCESS_POINT_ARN ✅ Direct ✅ Via staging

This appears to be a recurring integration pattern in this series: platforms that generate restrictive session policies need an explicit S3 Access Point ARN parameter so the generated policy includes the regional access point ARN.

Quick Decision Guide:

  • Zero-copy governed read on NAS data → External Table with AWS_ACCESS_POINT_ARN
  • Full AI + maximum query performance → COPY INTO internal table
  • RAG / semantic search over NAS documents → COPY INTO → Cortex Search Service (198ms)

GitHub Repository: fsxn-lakehouse-integrations


How to Read This Article

This article is:

  • A reproduction-focused validation report
  • Evidence from one environment (Snowflake Standard, ap-northeast-1)
  • A configuration guide for Snowflake + FSx for ONTAP S3 AP

Read by role:

  • Snowflake admin: Stage configuration → Working setup
  • Storage engineer: Evidence matrix → Root cause analysis
  • Data engineer: What works today → External Table setup
  • Partner / SA: Partner Decision Card → Architecture guidance
  • Security / governance reviewer: Governance Impact Summary → Regulated Workload Checklist
  • AI/ML engineer: AI / ML Integration Path → MLOps Boundary

Prerequisite Concepts

Before reading this article, it helps to understand:

  • Snowflake Storage Integration — an object that stores a reference to an IAM role for accessing external cloud storage
  • Snowflake External Stage — maps a cloud storage URL to a storage integration for data access
  • External Table — a Snowflake table that reads data directly from files on an external stage (no data copy)
  • AWS_ACCESS_POINT_ARN — a stage parameter that tells Snowflake to include the S3 Access Point ARN in its generated session policy
  • S3 Access Point ARN vs S3 bucket ARN — S3 AP uses arn:aws:s3:<region>:<account>:accesspoint/<name>, not arn:aws:s3:::<bucket>
  • Directory Table — a Snowflake feature that exposes file metadata (path, size, date) from a stage as a queryable table

Important premise: Snowflake does NOT officially document FSx for ONTAP S3 Access Points as a supported External Stage storage backend. The AWS_ACCESS_POINT_ARN parameter exists in Snowflake's CREATE STAGE documentation for S3 Access Points generally, but FSx for ONTAP S3 AP is not listed as a validated target. Our validation confirms that read and governance operations work when configured correctly, but this should not be interpreted as an officially supported configuration by Snowflake. Consult Snowflake Support before production use.


The Goal

Query structured and unstructured data stored on FSx for ONTAP from Snowflake — without copying data to a native S3 bucket. FSx for ONTAP S3 Access Points should make this possible by exposing NFS/SMB file data via S3 API.

In Part 1, Athena worked cleanly. In Part 2, Databricks required the access_point field and still has limitations. This article validates Snowflake's path.


Test Environment

Snowflake Account: Standard edition, AWS ap-northeast-1
Warehouse: COMPUTE_WH (X-Small)
Role: ACCOUNTADMIN
FSx for ONTAP: <FILE_SYSTEM_ID> (ONTAP 9.17.1)
SVM: <SVM_NAME>
S3 Access Point: Internet-origin, UNIX file system user
Enter fullscreen mode Exit fullscreen mode

Scope: This article validates Snowflake Standard edition. Enterprise features (e.g., advanced governance, private connectivity) may provide additional capabilities not tested here.


The Setup

Snowflake accesses external data through a three-layer configuration:

Storage Integration (IAM Role ARN + trust)
    │
    └── External Stage (S3 URL + AWS_ACCESS_POINT_ARN + file format)
            │
            └── External Table / SELECT @stage (data access)
Enter fullscreen mode Exit fullscreen mode

Visual Story: Before and After

❌ Before: SELECT Fails Without AWS_ACCESS_POINT_ARN

CREATE OR REPLACE STAGE fsxn_stage_without_arn
  STORAGE_INTEGRATION = fsxn_verification_integration
  URL = 's3://<ap-alias>/'
  FILE_FORMAT = (TYPE = PARQUET);

LIST @fsxn_stage_without_arn/sensor-data/;   -- ✅ Works
SELECT $1 FROM @fsxn_stage_without_arn/sensor-data/sensor_data.parquet LIMIT 3;  -- ❌ Access Denied
Enter fullscreen mode Exit fullscreen mode

SELECT from stage fails — access denied error despite LIST working

"Failed to access remote file: access denied. Please check your credentials." — The same file that LIST found cannot be read.


✅ After: SELECT Succeeds With AWS_ACCESS_POINT_ARN

CREATE OR REPLACE STAGE fsxn_stage_with_arn
  STORAGE_INTEGRATION = fsxn_verification_integration
  URL = 's3://<ap-alias>/'
  AWS_ACCESS_POINT_ARN = 'arn:aws:s3:<region>:<account>:accesspoint/<ap-name>'
  FILE_FORMAT = (TYPE = PARQUET);

SELECT $1 FROM @fsxn_stage_with_arn/sensor-data/sensor_data.parquet LIMIT 3;  -- ✅ SUCCESS
Enter fullscreen mode Exit fullscreen mode

Result: 3 rows of sensor data returned successfully.

{"humidity": 32.2, "id": 1, "pressure": 1002.1, "sensor_id": "S004", "status": "normal", "temperature": 21.13}
{"humidity": 45.63, "id": 2, "pressure": 1004.13, "sensor_id": "S005", "status": "normal", "temperature": 23.07}
{"humidity": 42.79, "id": 3, "pressure": 1000.18, "sensor_id": "S003", "status": "normal", "temperature": 36.96}
Enter fullscreen mode Exit fullscreen mode

✅ External Table Also Works

CREATE OR REPLACE EXTERNAL TABLE fsxn_sensor_ext_table
  LOCATION = @fsxn_stage_with_arn/sensor-data/
  FILE_FORMAT = (TYPE = PARQUET)
  AUTO_REFRESH = FALSE;

SELECT * FROM fsxn_sensor_ext_table LIMIT 3;  -- ✅ SUCCESS (3 rows)
Enter fullscreen mode Exit fullscreen mode

Complete Capability Matrix

Capability Status Notes
Read operations
SELECT from @​stage (Parquet) ✅ Verified GetObject with AWS_ACCESS_POINT_ARN
SELECT from @​stage (CSV) ✅ Verified CSV with SKIP_HEADER works
SELECT from @​stage (JSON) ✅ Expected Same GetObject path (no JSON files in test data)
External Table (read) ✅ Verified CREATE + SELECT both succeed
LIST @​stage (all prefixes) ✅ Verified Subdirectories included
GET_PRESIGNED_URL ✅ Observed Works but not officially supported
Load operations
COPY INTO (stage → table) ✅ Verified 4.9s for Parquet load
Governance
Governance Tags on External Table ✅ Verified CREATE TAG + ALTER TABLE SET TAG
SYSTEM$GET_TAG ✅ Verified Tag retrieval works
Row Access Policy ✅ Expected Standard Snowflake feature on tables
Column Masking ✅ Expected Standard Snowflake feature on tables
Write operations
PutObject (via COPY INTO unload) ⚠️ TBD FSx S3 AP supports PutObject ≤5GB
Event-driven
Snowpipe (auto-ingest) ❌ Not possible S3 Event Notifications not supported on FSx S3 AP
AUTO_REFRESH on External Table ❌ Not possible Requires S3 Event Notifications
Transactional table formats
Iceberg Table read (pre-existing metadata) ⚠️ TBD Requires separate validation
Iceberg Table write-back ❌ Not suitable Conditional writes not supported on FSx S3 AP
Delta / Hudi write ❌ Not suitable Conditional writes not supported
Supported file formats
Parquet ✅ Verified Primary format for analytics
CSV ✅ Verified With header skip, delimiter options
JSON ✅ Expected Same read path as Parquet/CSV
Avro ✅ Expected Snowflake-supported format, same read path
ORC ✅ Expected Snowflake-supported format, same read path

Key insight: With AWS_ACCESS_POINT_ARN, Snowflake achieves broad read and governance integration for the tested paths. The only limitations are event-driven features (Snowpipe, AUTO_REFRESH) and transactional write formats (Iceberg, Delta) — both due to FSx S3 AP API limitations, not Snowflake limitations.


The Root Cause: Session Policy ARN Mismatch

When Snowflake performs sts:AssumeRole, it applies a session policy. Without AWS_ACCESS_POINT_ARN, this session policy uses standard S3 bucket ARN patterns that don't match the FSx S3 AP regional ARN format:

Without AWS_ACCESS_POINT_ARN:
  Session policy allows GetObject on: arn:aws:s3:::*/*
  FSx S3 AP actual ARN:              arn:aws:s3:<region>:<account>:accesspoint/<name>/object/*
  → NO MATCH → AccessDenied

With AWS_ACCESS_POINT_ARN:
  Session policy includes:            arn:aws:s3:<region>:<account>:accesspoint/<name>/*
  → MATCH → GetObject succeeds
Enter fullscreen mode Exit fullscreen mode

This is the same pattern as Databricks Unity Catalog's access_point field — both platforms need the S3 AP ARN explicitly specified to include it in the generated session policy.


Evidence Matrix

Layer Evidence Result Interpretation
Snowflake integration DESCRIBE INTEGRATION ✅ Pass Trust established
Stage metadata LIST @​stage ✅ Pass ListBucket path works (bucket-level ARN matches)
Object read (no ARN) SELECT @​stage ❌ Fail GetObject blocked by session policy
Object read (with ARN) SELECT @​stage ✅ Pass AWS_ACCESS_POINT_ARN resolves session policy
External Table CREATE + SELECT ✅ Pass Governed table access works with ARN
Same role direct AWS CLI List/Get/Head ✅ Pass IAM/AP/FSx permissions are correct
FSx authorization File system user permissions ✅ Pass FSx-side permission permits access
Operational health SVM DNS check ✅ Pass Distinguish ReadTimeout from AccessDenied

FSx for ONTAP S3 AP Authorization Path

FSx for ONTAP S3 Access Points use a dual-layer authorization model:

Layer 1 — S3-side authorization:

  • IAM identity-based policy (Snowflake's assumed role session)
  • S3 Access Point resource policy
  • Session policy generated by Snowflake (requires AWS_ACCESS_POINT_ARN to include AP ARN)

Layer 2 — FSx for ONTAP-side authorization:

  • File system user associated with the access point
  • UNIX mode-bits / NFSv4 ACLs (for UNIX security style volumes)

In the Snowflake validation, the initial failure occurred at Layer 1 — Snowflake's generated session policy did not include the S3 AP ARN pattern. Setting AWS_ACCESS_POINT_ARN resolves this by instructing Snowflake to include the AP ARN in the session policy, allowing both layers to be evaluated normally.


S3 API Compatibility and Snowflake Operations

Snowflake operation Likely S3 operation FSx S3 AP support Observed result (with ARN)
LIST @​stage ListObjectsV2 ✅ Supported ✅ Success
SELECT @​stage GetObject / HeadObject ✅ Supported ✅ Success
GET_PRESIGNED_URL Presign / signed GetObject URL Presign not supported in FSx S3 AP docs Observed working; not a supported production path
External Table read GetObject ✅ Supported ✅ Success
Iceberg metadata read Head/Get + conditional Partial (conditional writes not supported) TBD

Comparison: Snowflake vs Databricks

Aspect Snowflake Databricks
Parameter name AWS_ACCESS_POINT_ARN (on stage) access_point (on External Location)
LIST without parameter ✅ Works ❌ Blocked (before access_point)
SELECT without parameter ❌ Fails ❌ Fails
SELECT with parameter ✅ Works ✅ Works (explicit path only)
External Table / UC Table ✅ Works ❌ CREATE TABLE still fails
Subdirectory listing ✅ Works ❌ Blocked
Documentation CREATE STAGE docs Databricks Support (May 2026)

Key difference: Snowflake's AWS_ACCESS_POINT_ARN resolves the issue more completely than Databricks' access_point field. Snowflake achieves full External Table support, while Databricks still cannot create UC tables.


Partner Decision Card

Customer requirement Snowflake + FSx S3 AP today Recommended path
File discovery only ✅ Works (LIST / Directory Table) Use directly
Query file contents in Snowflake ✅ Works with AWS_ACCESS_POINT_ARN Configure stage with ARN
Governed Snowflake external tables ✅ Works with AWS_ACCESS_POINT_ARN Configure stage with ARN
Zero-copy SQL on NAS data ✅ Snowflake or Athena Both work; choose by workload
Snowflake ML / Snowpark on NAS data ✅ Possible via External Table Configure stage with ARN, validate Snowpark path
Iceberg Table on FSx S3 AP TBD (conditional writes not supported) Validate separately

Choose Snowflake when governed external tables, tags, Directory Tables, or Snowpark integration are required. Choose Athena when lightweight AWS-native serverless SQL over NAS data is sufficient.


Discovery Questions for Partners

When a customer asks about Snowflake + FSx for ONTAP S3 Access Points:

  1. Is the workload read-only analytics, or does it require write-back?
  2. Is Snowflake governance (tags, row access policy, masking) required?
  3. Does the workload need real-time file detection (Snowpipe), or is scheduled refresh acceptable?
  4. Are the target files structured (Parquet/CSV/JSON) or unstructured (images/documents)?
  5. Is the data regulated (PHI, PII, financial)? If so, review presigned URL governance.
  6. Does the customer need Iceberg table format? (Write-back not supported on FSx S3 AP)
  7. What is the expected file count and average file size? (Impacts LIST/REFRESH latency)
  8. Is the Snowflake account in the same AWS region as FSx for ONTAP?

Governance Impact

Capability Status Governance impact
LIST @​stage ✅ Works File inventory; not data access governance
SELECT @​stage ✅ Works (with ARN) Query-level access via Snowflake governance
External Table ✅ Works (with ARN) Governed schema/table abstraction available
Iceberg Table ❌ Write not suitable Conditional writes not supported; read of pre-existing tables TBD
GET_PRESIGNED_URL ⚠️ Observed only Risk of bypassing Snowflake query governance if misused

For regulated workloads, do not use GET_PRESIGNED_URL as a workaround for query access. Even if URL generation is observed to work, it is not a governed Snowflake query path and should be reviewed separately for auditability, expiration, data classification, and access logging.


Governance Impact Summary

Important premise: FSx for ONTAP S3 Access Points are NOT officially documented by Snowflake as a supported External Stage storage backend. The governance paths described below are validated in this environment but should not be treated as officially supported configurations without Snowflake Support confirmation.

Access path Governance model Auditability Production suitability
External Table (with AWS_ACCESS_POINT_ARN) Snowflake RBAC + Tags + Row Access Policy High (Snowflake Access History, query logs) Recommended governed read path
COPY INTO (load to Snowflake table) Full Snowflake governance on loaded data High (standard Snowflake table governance) Recommended for ML/AI workloads requiring full governance
Directory Table + GET_PRESIGNED_URL File catalog governed; URL access is external Medium (catalog queries logged; URL access not logged by Snowflake) File discovery governed; downstream access requires separate audit
BUILD_SCOPED_FILE_URL Snowflake-mediated access High (access mediated through Snowflake privileges) Preferred for governed unstructured data access
GET_PRESIGNED_URL (direct) External access path Low (Snowflake does not log URL usage after generation) PoC / non-regulated only; requires separate access logging

Snowflake Access History captures query-level access to External Tables. However, presigned URL usage after generation is not tracked by Snowflake — use CloudTrail S3 data events for downstream audit if required.

MLOps Boundary

Reading data from FSx for ONTAP S3 AP via Snowflake External Table does not automatically make the downstream ML workflow governed.

If the data accessed via External Table or COPY INTO is used for ML or GenAI:

  • Register derived datasets in governed Snowflake tables
  • Track experiments with Snowflake ML lineage or external experiment tracking
  • Document source data access path (stage name, S3 AP alias, prefix, timestamp)
  • Record whether training data lineage is captured within Snowflake or externalized
  • Ensure Snowpark ML workloads use appropriate role privileges
  • If using Cortex functions, validate that input data classification is appropriate for the model

Snowflake's ML Lineage tracks feature-to-model relationships. If the source data path is an External Table on FSx S3 AP, document this as the lineage origin.

AI / RAG Data Readiness Checklist

If the FSx for ONTAP S3 AP data is intended for AI, RAG, or GenAI pipelines via Snowflake:

  • [ ] Are documents classified by sensitivity (PHI, PII, financial, internal, public)?
  • [ ] Are file-level permissions preserved or re-modeled for the AI pipeline?
  • [ ] Is metadata available for filtering and retrieval (file type, date, owner)? → Use Directory Table
  • [ ] Is freshness requirement defined (real-time, daily, weekly)? → Define REFRESH schedule
  • [ ] Is read-only access sufficient, or does the pipeline need write-back?
  • [ ] Is human review required for generated output before downstream use?
  • [ ] Is permission-aware retrieval required (user A sees only their authorized documents)?

If permission-aware retrieval is required, define one of:

  • Enforce at source access path — use per-user or per-group S3 Access Points with scoped file system users
  • Re-model permissions in metadata index — extract file-level ACLs into Directory Table metadata and filter at query time
  • Filter retrieval results by user/group claims — apply Snowflake Row Access Policy on External Table based on authenticated user identity
  • Do not proceed until authorization model is validated and approved by security owner

Snowflake + FSx S3 AP approval requirements (for regulated workloads):

  • Data owner approval for External Table / stage access
  • Security owner approval for presigned URL generation policy
  • Platform owner approval for COPY INTO (data leaves FSx, enters Snowflake)
  • Defined: allowed prefix, allowed operations, refresh schedule, expiration date
  • Approval record location (where the decision is stored)
  • Review / expiration date (when the approval must be re-evaluated)

For regulated workloads, exercise caution with:

  • GET_PRESIGNED_URL for patient-facing or financial data (bypasses Snowflake query governance)
  • COPY INTO without data classification review (data moves from FSx to Snowflake storage)
  • Cortex LLM functions on sensitive data without human review gate
  • Unreviewed access to regulated datasets via scoped URLs

Unstructured Data Support

Format Support Access Method Use Case
Images (JPEG, PNG, TIFF) GET_PRESIGNED_URL / BUILD_SCOPED_FILE_URL Thumbnail generation, ML inference, quality inspection
Video (MP4, MOV) GET_PRESIGNED_URL Streaming, frame extraction
Documents (PDF, DOCX) GET_PRESIGNED_URL / Snowpark File Access Text extraction, RAG, document processing
Audio (WAV, MP3) GET_PRESIGNED_URL Transcription, speech analytics
Binary / Archives GET_PRESIGNED_URL Download, transfer

How to manage unstructured data as a library:

-- Enable Directory Table for file catalog
ALTER STAGE fsxn_stage SET DIRECTORY = (ENABLE = TRUE);
ALTER STAGE fsxn_stage REFRESH;

-- Query file catalog (search by path, size, date)
SELECT RELATIVE_PATH, SIZE, LAST_MODIFIED
FROM DIRECTORY(@fsxn_stage)
WHERE RELATIVE_PATH LIKE '%images/%'
ORDER BY LAST_MODIFIED DESC;

-- Generate download URL for applications (valid 1 hour)
SELECT GET_PRESIGNED_URL(@fsxn_stage, 'images/photo001.jpg', 3600);

-- Generate Snowflake-proxied secure URL
SELECT BUILD_SCOPED_FILE_URL(@fsxn_stage, 'documents/report.pdf');
Enter fullscreen mode Exit fullscreen mode

Note: AUTO_REFRESH is not available because FSx S3 AP does not support S3 Event Notifications (GetBucketNotificationConfiguration is not supported). Use ALTER STAGE REFRESH manually or via Snowflake Task on a schedule.

URL type guidance: Use BUILD_SCOPED_FILE_URL when you want access mediated through Snowflake role privileges (governed path). Treat GET_PRESIGNED_URL as an external object access path that bypasses Snowflake query governance and requires separate review for regulated workloads.


AI / ML Integration Path

Snowflake provides AI/ML capabilities that can leverage FSx for ONTAP data via S3 AP. 7 out of 9 tested Cortex AI functions work directly on FSx S3 AP data without copying.

Snowflake AI/ML Feature FSx S3 AP Compatibility Access Path Duration Use Case
CORTEX.SUMMARIZE ✅ Direct External Table → Cortex 3.3s Text summarization on NAS documents
CORTEX.TRANSLATE ✅ Direct External Table → Cortex 5.1s Multi-language support
CORTEX.SENTIMENT ✅ Direct External Table → Cortex 2.5s Sentiment analysis
CORTEX.COMPLETE (text) ✅ Direct External Table → Cortex 16s AI analysis, anomaly detection
CORTEX.EXTRACT_ANSWER ✅ Direct External Table → Cortex 2.7s Information extraction
PARSE_DOCUMENT (OCR) ✅ Direct Stage path → OCR ~8s Invoice/report text extraction
COMPLETE (Vision/Multimodal) ✅ Workaround COPY FILES → internal stage → TO_FILE 41s Image analysis, defect detection
TO_FILE on FSx S3 AP ❌ Blocked "Remote file not found"
Cortex Search (RAG) ✅ Verified External Table → COPY INTO → Cortex Search Service 198ms query Semantic search over NAS documents

Key finding: Text-based Cortex functions, PARSE_DOCUMENT, and Cortex Search all work on FSx S3 AP data (Cortex Search requires COPY INTO as a staging step). Vision AI (multimodal COMPLETE) requires a staging step because TO_FILE() cannot resolve files on S3 AP external stages.

Validated AI/ML paths:

  • ✅ Cortex LLM SUMMARIZE on External Table data — AI-generated summary in 3.3s
  • ✅ Cortex TRANSLATE on External Table data — English to Japanese in 5.1s
  • ✅ Cortex SENTIMENT on External Table data — sentiment scores in 2.5s
  • ✅ Cortex COMPLETE (text) on External Table data — AI anomaly analysis in 16s
  • ✅ Cortex EXTRACT_ANSWER on External Table data — information extraction in 2.7s
  • PARSE_DOCUMENT (OCR) on FSx S3 AP stage file — text extraction from images in ~8s
  • COMPLETE (Vision AI) via COPY FILES workaround — image analysis in 41s (pixtral-large)
  • Cortex Search (RAG) — External Table → COPY INTO → Cortex Search Service → semantic query in 198ms
  • ✅ COPY INTO loads NAS data into Snowflake tables → available for all Cortex/ML functions
  • ✅ Directory Table catalogs unstructured files → enables file discovery for processing pipelines
  • ✅ GET_PRESIGNED_URL generates download URLs → enables external ML services to access files

Vision AI Workaround (Validated)

Direct TO_FILE() on FSx S3 AP external stage returns "Remote file not found." The workaround:

-- 1. Create unencrypted internal stage (SNOWFLAKE_SSE required — default encryption blocks TO_FILE)
CREATE OR REPLACE STAGE fsxn_ai_stage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- 2. Copy image from FSx S3 AP to internal stage
COPY FILES INTO @fsxn_ai_stage FROM @fsxn_ap_arn_test_stage/media/documents/invoice_sample.png;

-- 3. Enable Cross-Region Inference (required for vision models in ap-northeast-1)
ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION';

-- 4. Run Vision AI
ALTER STAGE fsxn_ai_stage SET DIRECTORY = (ENABLE = TRUE);
ALTER STAGE fsxn_ai_stage REFRESH;
SELECT SNOWFLAKE.CORTEX.COMPLETE('pixtral-large',
  'Describe this invoice. What is the invoice number, customer, and amount?', FILE
) AS vision_result
FROM (SELECT TO_FILE(BUILD_SCOPED_FILE_URL(@fsxn_ai_stage, RELATIVE_PATH)) AS FILE
      FROM DIRECTORY(@fsxn_ai_stage) WHERE RELATIVE_PATH LIKE '%.png' LIMIT 1);
Enter fullscreen mode Exit fullscreen mode

Result: Vision AI correctly identified Invoice #INV-2026-0524, Customer: Acme Corp, Amount: USD 1,234.56.

Data residency note: The COPY FILES step moves image data from FSx for ONTAP to Snowflake-managed internal storage. Cross-Region Inference may route data to US/EU regions for model processing. Verify compliance with your data residency requirements before enabling for regulated workloads.

Cortex Search (RAG) — Validated

Cortex Search provides semantic search over text data — the Snowflake-native RAG building block. The validated path uses External Table → COPY INTO → Cortex Search Service:

-- 1. Load FSx S3 AP data into internal table (required for Cortex Search)
COPY INTO sensor_documents FROM @fsxn_stage_with_arn/sensor-data/
  FILE_FORMAT = (TYPE = PARQUET);

-- 2. Create Cortex Search Service on the loaded data
CREATE OR REPLACE CORTEX SEARCH SERVICE sensor_search_service
  ON text_column
  WAREHOUSE = COMPUTE_WH
  TARGET_LAG = '1 hour'
  AS (SELECT * FROM sensor_documents);

-- 3. Semantic search query
SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
    'sensor_search_service',
    '{"query": "high temperature anomaly", "columns": ["text_column"], "limit": 5}'
  )
);
-- Result: Relevant documents returned in 198ms
Enter fullscreen mode Exit fullscreen mode

Dataset context: This validation used the sensor data loaded via COPY INTO from FSx S3 AP (1000 rows of IoT sensor readings). Cortex Search performance at scale (millions of documents, large text corpora) should be validated separately — 198ms is a sizing reference for this dataset size, not a service-level guarantee.

GA status: Verify that Cortex Search Service and its query functions are Generally Available (GA) in your Snowflake edition and region before production use. Preview features may not be covered by Snowflake SLA and should not be used for regulated workloads without explicit vendor confirmation.

Cortex Search Service created successfully

Cortex Search Service created on data loaded from FSx for ONTAP via COPY INTO.

Cortex Search semantic query returns results in 198ms

Semantic search query returns relevant results in 198ms — RAG-style retrieval over NAS-originated data.

Key insight: Cortex Search requires COPY INTO (data must be in a Snowflake internal table), but the end-to-end path from FSx for ONTAP → External Stage → COPY INTO → Cortex Search Service → semantic query is validated. This provides a Snowflake-native RAG path for NAS documents.

Data residency change: COPY INTO moves data from FSx for ONTAP to Snowflake-managed storage. Once loaded, the data is subject to Snowflake's storage lifecycle, not ONTAP's. For regulated workloads, obtain data owner approval before COPY INTO and document the residency change in your compliance records. Cortex Search Service indexes are stored in the same region as the Snowflake account — no cross-region data movement occurs for the index itself.

Comparison with Bedrock Knowledge Bases: Cortex Search requires a COPY INTO step (data moves to Snowflake storage). Bedrock Knowledge Bases can read directly from FSx S3 AP without copying. Choose Cortex Search when the RAG pipeline must stay within Snowflake governance. Choose Bedrock KB when data residency on FSx is mandatory and AWS-native RAG is preferred.

PoC Quick Start — Validate Cortex Search on your NAS data in 3 steps (estimated: 30 minutes with pre-configured stage):

  1. Configure External Stage with AWS_ACCESS_POINT_ARN (see Configuration Guide above)
  2. Run COPY INTO <target_table> FROM @​fsxn_stage/<your-documents-prefix>/ to load text data
  3. Create Cortex Search Service on the loaded table and run a semantic query to validate retrieval quality

Manufacturing Use Case: OCR + AI on NAS Data

-- OCR: Extract text from inspection report image stored on FSx for ONTAP
SELECT SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
  @fsxn_stage,
  'media/documents/invoice_sample.png',
  {'mode': 'OCR'}
) AS ocr_result;
-- Result: "INVOICE #INV-2026-0524", "Customer: Acme Corp", "Amount: USD 1,234.56"

-- AI Analysis: Analyze sensor data for anomalies
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large2',
  'Analyze this IoT sensor reading and identify anomalies: ' || VALUE::VARCHAR
) AS ai_analysis FROM fsxn_sensor_ext_table LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

PARSE_DOCUMENT OCR extracts text from invoice image on FSx S3 AP

PARSE_DOCUMENT (OCR mode) extracts text from an image on FSx for ONTAP via S3 AP — works directly without copying.

Cortex COMPLETE generates AI analysis of sensor data from FSx S3 AP

Cortex COMPLETE (mistral-large2) generates AI anomaly analysis of IoT sensor data on FSx for ONTAP — works directly on External Table data.

Vision AI successfully analyzes invoice image (via staging workaround)

Vision AI (pixtral-large) correctly extracts invoice details from an image originally on FSx for ONTAP — requires COPY FILES to internal stage.

Not validated in this article:

  • Snowpark File Access (SnowflakeFile.open()) for direct binary file processing in UDFs
  • AI_TRANSCRIBE for audio files on FSx S3 AP

Comparison with Databricks AI/ML path:

AI/ML Capability Snowflake + FSx S3 AP Databricks + FSx S3 AP
Governed table as ML input ✅ External Table ❌ UC Table creation blocked
Text AI (LLM) on NAS data ✅ 6 Cortex functions direct ⚠️ boto3 + external LLM (bypasses UC)
Vision AI on NAS images ✅ Via staging workaround (41s) ⚠️ boto3 driver-only (bypasses UC)
OCR / Document extraction ✅ PARSE_DOCUMENT direct (8s) ⚠️ boto3 + external OCR
Feature engineering ✅ Snowpark DataFrame on External Table ⚠️ spark.read with explicit path only
File catalog for ML pipeline ✅ Directory Table ⚠️ dbutils.fs.ls (top-level only)
RAG over NAS documents ✅ Cortex Search (via COPY INTO, 198ms) ⚠️ boto3 + external RAG (bypasses UC)

Key insight: Snowflake's AI/ML path benefits from governed External Tables and direct Cortex function access — 8 out of 10 tested functions work on FSx data (7 directly without copying, 1 via COPY INTO for Cortex Search). Databricks' AI/ML path is limited by UC table creation failure, forcing boto3 workarounds that bypass governance.

For end-to-end RAG on NAS documents: Use Snowflake Cortex Search (validated: External Table → COPY INTO → Cortex Search Service, 198ms query latency) or Amazon Bedrock Knowledge Bases as the AWS-documented path (no copy needed).

Decision guidance: Use Snowflake when the customer already needs Snowflake governance, Cortex/Snowpark processing, or table-based feature engineering. Use Bedrock Knowledge Bases when the primary requirement is AWS-native permission-aware RAG over NAS documents.


Comparison: Snowflake vs Databricks (Governance)

Governance Capability Snowflake + FSx S3 AP Databricks + FSx S3 AP
Table creation ✅ External Table ❌ CREATE TABLE fails
Data classification tags ✅ Governance Tags ❌ UC Table not creatable
Access control ✅ Row Access Policy ❌ UC governance not applicable
File catalog ✅ Directory Table ⚠️ dbutils.fs.ls (top-level only)
Secure URL generation ✅ BUILD_SCOPED_FILE_URL
Column masking ✅ Available
COPY INTO (data load)
Unstructured data catalog ✅ Directory Table + Presigned URL ⚠️ boto3 only (bypasses governance)

Key takeaway: In this validation, Snowflake with AWS_ACCESS_POINT_ARN achieved a more complete governed read path than the Databricks path tested in Part 2. Snowflake can create governed tables, apply tags, and manage unstructured data catalogs — capabilities that remain blocked in Databricks due to UC table creation failure.

For regulated workloads: Snowflake provides a more complete governed path today (External Table + Tags + Row Access Policy + audit trail). Databricks requires staged ingestion to S3 for equivalent governance. If your compliance framework requires governed table-level access control on the data, Snowflake is the validated path for FSx S3 AP integration.


Business Impact

Requirement Observed result Business impact Recommended decision
Zero-copy Snowflake query over NAS ✅ Works (with ARN) Eliminates copy pipeline Use AWS_ACCESS_POINT_ARN stage
Snowflake governance on FSx data ✅ External Table works Governed table abstraction available Create External Tables
File inventory from Snowflake ✅ Works Metadata cataloging possible Use LIST / Directory Tables
RAG / AI over NAS documents ✅ Cortex Search validated (198ms) Snowflake-native RAG path available COPY INTO → Cortex Search Service
Text AI on NAS data (no copy) ✅ 7 functions direct AI processing without data movement Use Cortex functions on External Table

Detailed validation metrics (refresh duration, file count, query latency, COPY INTO duration, URL generation success rate) should be recorded in the verification-pack evidence files rather than treated as universal benchmark numbers.


Use Case Fit Matrix

Use case Best current path Why
SQL analytics on structured NAS files Snowflake External Table or Athena Both validated; Snowflake adds governance tags
Unstructured data catalog Snowflake Directory Table File metadata queryable with governance
Data load from NAS to Snowflake COPY INTO from FSx S3 AP stage Validated (4.9s for Parquet)
RAG over NAS documents Cortex Search (via COPY INTO, validated 198ms) or Bedrock KB (AWS-native) Cortex Search validated; Bedrock KB is AWS-documented path
ML feature engineering Snowpark DataFrame on External Table Governed read path available
Real-time ingestion Not FSx S3 AP path Use native S3 + Snowpipe
Iceberg / transactional tables Not FSx S3 AP path Use native S3 for write-back

Cost Model Considerations

Component Cost driver Notes
Snowflake warehouse Credit consumption during queries X-Small sufficient for validation; scale per workload
FSx for ONTAP Throughput capacity + storage S3 AP queries share throughput with NFS/SMB workloads
S3 AP requests No additional S3 request charges FSx S3 AP does not incur separate S3 API fees
Data transfer Standard AWS data transfer Snowflake SaaS in same region minimizes transfer

Cost comparison across engines is not the focus of this article. Snowflake's credit-based model differs fundamentally from Athena's per-TB-scanned model. Evaluate based on workload pattern, governance requirements, and existing Snowflake investment.


Configuration Guide

Step 1: Create Storage Integration

CREATE OR REPLACE STORAGE INTEGRATION fsxn_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<account>:role/<role-name>'
  STORAGE_ALLOWED_LOCATIONS = ('s3://<ap-alias>/');
Enter fullscreen mode Exit fullscreen mode

Step 2: Create Stage WITH AWS_ACCESS_POINT_ARN

CREATE OR REPLACE STAGE fsxn_stage
  STORAGE_INTEGRATION = fsxn_integration
  URL = 's3://<ap-alias>/'
  AWS_ACCESS_POINT_ARN = 'arn:aws:s3:<region>:<account>:accesspoint/<ap-name>'
  FILE_FORMAT = (TYPE = PARQUET);
Enter fullscreen mode Exit fullscreen mode

Step 3: Verify

LIST @fsxn_stage/;                                    -- File discovery
SELECT $1 FROM @fsxn_stage/path/to/file.parquet LIMIT 5;  -- Data read
Enter fullscreen mode Exit fullscreen mode

Step 4: Create External Table (optional)

The following DDL is simplified for readability. See the GitHub SQL scripts for the exact tested definition.

CREATE OR REPLACE EXTERNAL TABLE my_ext_table
  LOCATION = @fsxn_stage/sensor-data/
  FILE_FORMAT = (TYPE = PARQUET)
  AUTO_REFRESH = FALSE;
Enter fullscreen mode Exit fullscreen mode

Internal Table vs External Table — Design Guide

Understanding the difference between internal (managed) tables and external tables is critical for architecture decisions when integrating FSx for ONTAP with Snowflake.

Comparison Matrix

Aspect External Table (on FSx S3 AP) Internal Table (COPY INTO)
Data location Remains on FSx for ONTAP (zero-copy) Copied into Snowflake-managed storage
Multi-protocol access Same data via NFS/SMB/S3 AP simultaneously Only accessible via Snowflake
Data freshness Real-time (reads current file state) Stale until next COPY INTO
Query performance Slower (estimated ~2-5s for small queries based on observed S3 AP GetObject latency) Faster (sub-second with micro-partitions, pruning)
Governance (Tags, Masking) ✅ Full support ✅ Full support
Time Travel ❌ Not available ✅ Available (up to 90 days)
Cortex AI (text functions) ✅ Direct (SUMMARIZE, TRANSLATE, etc.) ✅ Direct
Cortex AI (Vision/TO_FILE) ❌ TO_FILE blocked on FSx S3 AP ✅ Works on internal stage
Cortex Search (RAG) ❌ Requires COPY INTO first ✅ Direct
ONTAP features preserved ✅ Snapshot, FlexClone, Dedup, FPolicy ❌ Data is outside ONTAP
Storage cost FSx for ONTAP only (no Snowflake storage) FSx + Snowflake storage (duplicate)

Decision Flowchart

Q: Does the data need to stay on FSx for ONTAP?
├── YES → External Table
│         Q: Do you need Vision AI or Cortex Search?
│         ├── YES → Hybrid: External Table + selective COPY INTO
│         └── NO → External Table is sufficient (text AI works directly)
│
└── NO → COPY INTO internal table
          Q: Do you need real-time freshness?
          ├── YES → Scheduled COPY INTO (Task) or FPolicy → Lambda → Snowpipe
          └── NO → Batch COPY INTO on schedule
Enter fullscreen mode Exit fullscreen mode

Cost Comparison

Pattern FSx Storage Snowflake Storage Best For
External Table only ✅ (existing) None Read-heavy, compliance, multi-protocol
COPY INTO (full) ✅ (existing) + full copy Max performance, Time Travel, full AI
Hybrid (External + selective COPY) ✅ (existing) + images/RAG data only AI workloads with data residency needs

Industry-Specific Recommendations

Industry Recommended Pattern Rationale PoC Success Criteria
Manufacturing External Table + PARSE_DOCUMENT (OCR) Data stays on FSx; inspection images processed in place OCR extracts text from 10+ inspection images in <10s each
Financial Services Hybrid (External Table + COPY INTO for Cortex Search) Compliance requires data on FSx; RAG needs internal table Cortex Search returns relevant compliance docs in <500ms
Healthcare External Table + SnapLock PHI must not leave controlled storage; immutable audit SELECT on External Table succeeds with governance tags applied
Media / Entertainment External Table + COPY FILES (Vision AI) Large media files stay on FSx; selective staging for AI Vision AI describes image content correctly via staging path
Cross-Industry Analytics COPY INTO (full) Maximum query performance; data duplication acceptable COPY INTO completes in <10s for representative dataset

Snowpipe Alternatives for FSx for ONTAP

Since FSx S3 AP does not support S3 Event Notifications, standard Snowpipe auto-ingest is not available. Use these alternatives:

Option 1: FPolicy → Lambda → SNS → Snowpipe REST API (Recommended)

FSx for ONTAP ──FPolicy──▶ Lambda ──▶ SNS ──▶ Snowpipe REST API ──▶ COPY INTO target table
     │                                              │
     └── NFS/SMB users access same data             └── Snowflake governance on loaded data
Enter fullscreen mode Exit fullscreen mode
  • Latency: Seconds (<30s from file write to Snowflake availability)
  • Complexity: Medium (requires FPolicy configuration + Lambda function)
  • Best for: Near-real-time ingestion requirements

FPolicy throughput note: FPolicy introduces minimal latency on the NFS/SMB I/O path (typically <1ms per operation for passthrough mode). However, under high-frequency file write workloads (thousands of files/second), validate throughput impact on the FSx for ONTAP file system before production deployment.

Option 2: Snowflake Task + COPY INTO (Simple)

-- Create a task that runs COPY INTO every 5 minutes
CREATE OR REPLACE TASK fsxn_ingest_task
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = '5 MINUTE'
AS
  COPY INTO target_table FROM @fsxn_stage_with_arn/incoming/
  FILE_FORMAT = (TYPE = PARQUET)
  PATTERN = '.*[.]parquet';

ALTER TASK fsxn_ingest_task RESUME;
Enter fullscreen mode Exit fullscreen mode
  • Latency: Minutes (configurable schedule interval)
  • Complexity: Low (pure Snowflake SQL)
  • Best for: Batch ingestion where minutes-level latency is acceptable

Option 3: Snowpipe REST API (Manual Trigger)

Applications call the Snowpipe REST API with a file list when new files are known:

  • Latency: Seconds (triggered by application)
  • Complexity: Low (API call from any application)
  • Best for: Application-controlled ingestion workflows

Snowpipe / COPY INTO Supported Formats

Format Snowpipe COPY INTO External Table Notes
CSV Delimiter, header, encoding options
JSON Nested, semi-structured
Parquet Column pruning, predicate pushdown
Avro Schema evolution supported
ORC Read-only
XML Native support

Stop Criteria

Stop the Snowflake direct-access PoC when:

  • SELECT from stage fails with AccessDenied after AWS_ACCESS_POINT_ARN is configured and IAM/AP/FSx permissions are proven correct
  • The workload requires Iceberg Table write-back (conditional writes not supported on FSx S3 AP)
  • Data owner does not approve the access path
  • ReadTimeout occurs (check SVM DNS/AD configuration — see Networking Troubleshooting)

Regulated Workload Checklist

Before using Snowflake + FSx S3 AP for regulated data:

  • [ ] Confirm the S3 Access Point file-system user identity and least-privilege permissions
  • [ ] Confirm Snowflake role privileges for stage, external table, and tag access
  • [ ] Define whether users may generate presigned or scoped URLs (prefer BUILD_SCOPED_FILE_URL for governed access)
  • [ ] Record derived data locations if COPY INTO loads data into Snowflake tables
  • [ ] Define manual refresh schedule and evidence retention
  • [ ] Store approval owner, review date, and expiration date
  • [ ] Validate that GET_PRESIGNED_URL is not used as a bypass for query-level governance
  • [ ] If Vision AI is required: Approve COPY FILES to internal stage (data moves to Snowflake-managed storage)
  • [ ] If Cross-Region Inference is enabled: Verify that image/document data may be processed in US/EU regions
  • [ ] If Cortex Search is used: Approve COPY INTO (data moves to Snowflake storage) AND Cortex Search Service index creation (data residency changes twice — once for table load, once for search index). Cortex Search Service index is stored in the Snowflake account region.

Store the checklist result with an approval ID, owner, review date, expiration date, and evidence location so the PoC decision can be audited later.

Cross-Region Inference — Data Residency Warning

When CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION' is set, Cortex AI functions may route data to model endpoints in other AWS regions (US, EU) for processing. For regulated workloads:

  • Verify: Does your compliance framework allow data processing outside the home region?
  • Alternatives: Use AWS_US or AWS_EU instead of ANY_REGION to limit routing scope
  • Mitigation: Process only non-regulated images via Vision AI; keep PHI/PII in text-only Cortex functions (which run in-region)
  • Documentation: Record which Cross-Region setting is used and which data types are processed

Compliance Framework Mapping

Framework Recommended Pattern Key Controls
HIPAA (PHI) External Table + SnapLock + FPolicy audit Data never leaves FSx; file access audited; admin cannot delete during retention
SOX (Financial) COPY INTO + Time Travel + audit trail Full change history; point-in-time queries for audit
GDPR (PII) External Table + Row Access Policy + Tag-based Masking Data minimization at query time; PII masked for non-authorized roles
FINRA (Records) External Table + SnapLock Compliance Non-erasable, non-writable records for retention period

Approval Evidence Example

approval_id: "FSXN-SF-POC-001"
data_owner: "<name/group>"
security_owner: "<name/group>"
platform_owner: "<name/group>"
allowed_prefixes:
  - "s3://<ap-alias>/sensor-data/"
  - "s3://<ap-alias>/bronze/"
allowed_operations:
  - LIST
  - SELECT (External Table)
  - COPY INTO (load only)
  - Directory Table
  - BUILD_SCOPED_FILE_URL
  - Cortex text functions (SUMMARIZE, TRANSLATE, SENTIMENT)
  - COPY FILES to internal stage (for Vision AI only)
disallowed_operations:
  - GET_PRESIGNED_URL for regulated data
  - COPY INTO unload (write-back)
  - Cortex LLM on PHI/PII without human review
  - Cross-Region Inference on regulated images (unless approved)
cross_region_inference: "ANY_REGION"  # or "DISABLED" for regulated data
review_date: "<YYYY-MM-DD>"
expiration_date: "<YYYY-MM-DD>"
evidence_location: "verification-pack/snowflake/evidence/<date>/evidence-record.yaml"
Enter fullscreen mode Exit fullscreen mode

COPY INTO unload (write-back to FSx S3 AP) was not validated in this article. Although FSx S3 AP supports PutObject, Snowflake unload behavior should be tested separately before positioning write-back as supported.

Data residency note: COPY INTO (load) and COPY FILES change the data residency model — source files remain on FSx, but a derived copy is created in Snowflake-managed storage. Cross-Region Inference may further route data to other regions. Treat loaded tables and staged files as derived regulated data and apply retention, classification, and deletion controls separately.


Troubleshooting Playbook

When Snowflake access to FSx for ONTAP S3 AP fails, isolate one layer at a time:

  1. Stage configuration — Is AWS_ACCESS_POINT_ARN set? Without it, GetObject will fail.
  2. IAM — Does the Storage Integration role have s3:GetObject, s3:ListBucket on the S3 AP ARN?
  3. S3 AP policy — Does the Access Point resource policy allow the Snowflake IAM user ARN?
  4. FSx file system — Is the file system user (e.g., root) permitted to read the target files?
  5. Network — Is the AP internet-origin? (Snowflake SaaS cannot use VPC-origin APs)
  6. Operational — Does vserver services dns check show healthy DNS? (ReadTimeout = DNS/AD issue)

Known Failure Signatures

Symptom Likely layer Next step
LIST works, SELECT fails with "access denied" Missing AWS_ACCESS_POINT_ARN Add ARN parameter to stage
LIST and SELECT both fail with "access denied" IAM role or S3 AP policy Check DESCRIBE INTEGRATION, verify trust policy
ReadTimeout (no response) SVM DNS/AD or FSx backend Check vserver services dns check; verify S3 AP lifecycle
Stage creation fails Storage Integration config Verify STORAGE_ALLOWED_LOCATIONS includes the AP alias
External Table creation fails Stage or file format issue Verify LIST works first, then check FILE_FORMAT
COPY INTO fails File format mismatch or permissions Verify SELECT works first

What This Article Does Not Conclude

This article does not conclude that Snowflake + FSx for ONTAP S3 AP is production-certified for all workloads. It documents the behavior observed in one validated environment and identifies the configuration required for successful integration.

Specifically, this article does not validate:

  • Snowpipe auto-ingest (requires S3 Event Notifications)
  • Iceberg Table write-back (requires conditional writes)
  • COPY INTO unload / write-back to FSx S3 AP
  • Snowpark File Access (SnowflakeFile.open) for binary processing
  • Performance at scale (large file counts, concurrent queries, large directory refreshes, or mixed NFS/SMB/S3 workload contention on the FSx file system)
  • Private connectivity (PrivateLink) path

Operational Note: ReadTimeout vs AccessDenied

During this validation series, all S3 APs on one SVM became unresponsive for 7+ days due to orphaned DNS/AD configuration.

Important distinction:

  • ReadTimeout (no response) → Check SVM DNS/AD configuration
  • AccessDenied (immediate error) → Check AWS_ACCESS_POINT_ARN stage parameter

See FSx S3 AP Networking — DNS/AD Troubleshooting for details.


Lessons Learned

1. Platform documentation holds the answer

The AWS_ACCESS_POINT_ARN parameter exists in Snowflake's CREATE STAGE documentation. The initial "no workaround" conclusion was premature — always check platform docs for S3 AP-specific parameters before concluding incompatibility.

2. The same pattern recurs across platforms

Both Snowflake (AWS_ACCESS_POINT_ARN) and Databricks (access_point field) require explicit S3 AP ARN configuration. This appears to be a recurring integration pattern: platforms that generate restrictive session policies need an explicit parameter so the generated policy includes the regional access point ARN format.

3. LIST ≠ READ (but the fix is simple)

The partial success (LIST works, SELECT doesn't) is confusing but has a clear fix. The root cause is that ListBucket uses bucket-level ARN matching while GetObject requires object-level ARN matching — and the AP ARN parameter resolves both.

4. SVM DNS/AD configuration can silently break S3 AP

ReadTimeout (not AccessDenied) indicates an operational issue, not a session policy issue. Check vserver services dns check on the SVM.

5. Pre-signed URLs work but are not a governed path

GET_PRESIGNED_URL() generates valid URLs for FSx S3 AP objects. However, this bypasses Snowflake query governance and should not be used as a production workaround for regulated workloads.


What to Tell Stakeholders

Current recommendation (8 out of 10 tested AI functions validated on FSx data):

  • Use Snowflake External Stage with AWS_ACCESS_POINT_ARN for governed read access to FSx for ONTAP data
  • Use External Tables for governed schema abstraction with tags and access policies
  • Use COPY INTO when data needs to be loaded into Snowflake for ML/AI processing
  • Use Directory Table for unstructured data cataloging
  • Do not rely on Snowpipe AUTO_REFRESH — use scheduled ALTER STAGE REFRESH instead
  • Do not position Iceberg write-back on FSx S3 AP as supported
  • For end-to-end RAG, use Cortex Search (validated: External Table → COPY INTO → Cortex Search Service, 198ms query) or Bedrock Knowledge Bases (AWS-documented path, no copy needed)

This validation should be used to guide architecture selection and stage configuration, not as a production certification.


What's Next

  • Part 1: Athena — Query NAS Data In Place (validated read-oriented SQL path)
  • Part 2: Databricks — A Layer-by-Layer Validation of Observed Boundaries (session policy + access_point field)
  • Part 4: DuckDB Lambda — Serverless analytics at $0.00001/query (for teams that need lightweight, zero-idle-cost SQL without warehouse management)
  • Part 5: EMR Spark — Read-Write ETL Pipeline (for teams that need distributed Spark processing with write-back to S3 for downstream lakehouse consumption)

References


Key achievement: This validation established that Snowflake + FSx for ONTAP S3 AP provides a governed, AI-ready read path — 8 out of 10 tested Cortex AI functions work on NAS data, External Tables enable full governance (tags, masking, row policies), and Cortex Search delivers 198ms semantic search over NAS-originated documents. This is the most complete governed integration path validated in this series.

This article documents observed behavior in one validated environment (Snowflake Standard edition, AWS ap-northeast-1, May 2026). Platform behavior may change with future updates.

Disclaimer: This article is an independent validation report and does not represent Snowflake, AWS, or NetApp official guidance. Product behavior, support status, and platform capabilities may change. Always validate in your own environment and consult vendor documentation and support channels.

Top comments (0)