TL;DR
In Part 1, Athena provided serverless SQL. In Part 2, Databricks hit boundaries. In Part 3, Snowflake works with config. In Part 4, DuckDB Lambda was cheapest. In Part 5, EMR Spark delivered full ETL. This Part 6 adds enterprise governance: Redshift Spectrum + Lake Formation provides 4-layer authorization on NAS data.
Redshift Serverless (8 RPU) successfully queries FSx for ONTAP data via S3 Access Points using the same Glue Catalog tables as Athena — no additional data registration needed. Add Lake Formation on top for table-level, column-level, and tag-based access control.
| Query | Duration | Comparison with Athena |
|---|---|---|
| COUNT(*) 10K rows | 3,231 ms | Athena: ~1,500 ms |
| GROUP BY aggregation | 2,580 ms | Athena: ~1,800 ms |
| COUNT(*) 5M rows | 4,277 ms | Athena: 2,196 ms |
~2x slower than Athena for simple scans (Redshift Serverless cold start overhead), but Redshift adds DWH capabilities: federated JOINs with local tables, materialized views, and stored procedures.
Quick Decision Guide:
- Need DWH JOINs with NAS data → Redshift Spectrum (this article)
- Need enterprise governance (table/column/tag) → Add Lake Formation
- Need serverless SQL only (no DWH) → Use Athena (Part 1) — faster and cheaper
GitHub: fsxn-lakehouse-integrations
How to Read This Article
This article is:
- A reproduction-focused validation report
- Evidence from one environment (Redshift Serverless 8 RPU, ap-northeast-1)
- A governance architecture guide for Lake Formation + FSx S3 AP
Read by role:
- DWH engineer: Architecture → Setup → Benchmark Results
- Security / governance reviewer: 4-Layer Authorization → Governance Impact
- Data engineer: When to Use → Comparison with Athena
- Partner / SA: Partner Decision Card → Discovery Questions
Prerequisite Concepts
Before reading this article, it helps to understand:
- Redshift Spectrum — Redshift's ability to query data in S3 via external schemas (Glue Catalog)
- Redshift Serverless — pay-per-query Redshift without cluster management (measured in RPU)
- Lake Formation — AWS's centralized governance layer for data lakes (table/column/tag permissions)
- Glue Catalog — AWS's metadata catalog (shared by Athena, Redshift Spectrum, EMR, Glue)
- External Schema — a Redshift schema that maps to a Glue Catalog database
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Redshift Serverless (8 RPU) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ SQL Query │ │
│ │ SELECT * FROM fsxn_spectrum.sensor_readings │ │
│ │ JOIN local_table ON ... │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ External Schema (Glue Catalog) │
└──────────────────────────┼───────────────────────────────────────┘
│
┌────────────┼────────────┐
│ │ │
Lake Formation IAM Role S3 Access Point
(table/column (API (resource
permissions) access) policy)
│ │ │
└────────────┼────────────┘
│
▼
FSx for ONTAP Volume (Parquet files)
4-Layer Authorization:
- Lake Formation — Who can access which tables/columns (fine-grained)
- IAM — Who can call which AWS APIs
- S3 Access Point Policy — Which principals can access this access point
- File System — UNIX permissions on the underlying files
Benchmark Results
| Query | Duration (ms) | Rows | Notes |
|---|---|---|---|
| CREATE EXTERNAL SCHEMA | 240 | — | One-time setup |
| COUNT(*) 10K rows | 3,231 | 10,000 | Cold start overhead |
| GROUP BY + AVG aggregation | 2,580 | 3 groups | Status grouping |
| COUNT(*) 5M rows | 4,277 | 5,000,000 | Large scan |
Environment: Redshift Serverless 8 RPU, ap-northeast-1. FSx for ONTAP Single-AZ, 128 MB/s.
Performance note: Redshift Serverless has cold start overhead (~2-3s for first query). Warm queries on provisioned Redshift clusters would be faster. For simple scans, Athena is ~2x faster because it has no DWH initialization overhead.
Evidence Matrix
| Layer | Evidence | Result | Interpretation |
|---|---|---|---|
| Redshift Serverless | Workgroup creation (8 RPU) | ✅ Pass | Serverless endpoint available |
| IAM role | Spectrum role with S3 AP permissions | ✅ Pass | GetObject + ListBucket on AP ARN |
| External Schema | CREATE EXTERNAL SCHEMA from Glue | ✅ Pass | Same catalog as Athena |
| Spectrum read (small) | COUNT(*) 10K rows | ✅ Pass | 3,231ms |
| Spectrum read (aggregation) | GROUP BY + AVG | ✅ Pass | 2,580ms |
| Spectrum read (large) | COUNT(*) 5M rows | ✅ Pass | 4,277ms |
| Lake Formation admin | put-data-lake-settings | ✅ Pass | Admin configured |
| Lake Formation grant | Table-level SELECT grant | ✅ Pass | Fine-grained permission works |
| LF column-level | SELECT on 3 permitted columns | ✅ Pass | Non-permitted column returns "cannot be resolved" |
| LF column deny | SELECT on denied column (humidity) | ✅ Pass (denied) | "Column cannot be resolved or requester is not authorized" |
| LF row filter | Data cells filter creation | ✅ Pass | Row filter (status='normal') + column filter combined |
| LF-Tag creation | sensitivity tag (public/internal/confidential) | ✅ Pass | Tag created and assigned to table |
| LF-Tag permission | Tag-based DESCRIBE+ASSOCIATE grant | ✅ Pass | Scalable governance via classification |
| Athena under LF | Query with LF permissions active | ✅ Pass | Same governance applies to Athena |
Setup
Step 1: Create External Schema (reuses Glue Catalog)
CREATE EXTERNAL SCHEMA fsxn_spectrum
FROM DATA CATALOG
DATABASE 'fsxn_athena_verification'
IAM_ROLE 'arn:aws:iam::<ACCOUNT_ID>:role/fsxn-redshift-spectrum-role'
REGION 'ap-northeast-1';
Key insight: This uses the same Glue Catalog database that Athena uses. No additional table registration needed — if Athena can query it, Redshift Spectrum can too.
Step 2: Query FSx for ONTAP Data
-- Simple count
SELECT COUNT(*) FROM fsxn_spectrum.sensor_readings;
-- Result: 10000 (3,231ms)
-- Aggregation
SELECT status, COUNT(*), AVG(temperature)
FROM fsxn_spectrum.sensor_readings
GROUP BY status;
-- Result: 3 groups (2,580ms)
-- JOIN with local Redshift table (DWH capability)
SELECT s.device_id, s.temperature, d.location
FROM fsxn_spectrum.sensor_readings s
JOIN device_master d ON s.device_id = d.device_id
WHERE s.temperature > 35;
Step 3: Add Lake Formation Governance
# Set Lake Formation admin
aws lakeformation put-data-lake-settings \
--data-lake-settings '{"DataLakeAdmins": [{"DataLakePrincipalIdentifier": "arn:aws:iam::<ACCOUNT_ID>:user/<admin>"}]}'
# Grant table-level SELECT to a role
aws lakeformation grant-permissions \
--principal '{"DataLakePrincipalIdentifier": "arn:aws:iam::<ACCOUNT_ID>:role/fsxn-analyst-role"}' \
--resource '{"Table": {"DatabaseName": "fsxn_athena_verification", "Name": "sensor_readings"}}' \
--permissions '["SELECT", "DESCRIBE"]'
Lake Formation Data Permissions: fine-grained table-level SELECT grant for fsxn-athena-glue-role on sensor_readings table.
Lake Formation Governance Value
| Capability | Without Lake Formation | With Lake Formation |
|---|---|---|
| Table-level access | S3 AP policy (all-or-nothing per prefix) | Per-table SELECT/DESCRIBE grants |
| Column-level security | ❌ Not possible | ✅ Column-level grants + masking |
| Row-level filtering | ❌ Not possible | ✅ Data Cells Filter (row filter expressions) |
| Tag-based access control | ❌ Not possible | ✅ Classify data → auto-grant by tag (LF-Tags) |
| Centralized audit | CloudTrail (API-level) | Lake Formation audit (table/column-level) |
| Cross-account sharing | Share S3 AP (complex) | Share tables via Lake Formation (simple) |
Fine-Grained Governance — Verified (May 2026)
All three fine-grained Lake Formation capabilities have been validated on FSx for ONTAP S3 AP data:
| Feature | Test | Result |
|---|---|---|
| Column-level permission | Grant SELECT on 3 of 4 columns; query the denied column | ✅ Permitted columns return data; denied column (humidity) returns "cannot be resolved" |
| Row filter (Data Cells Filter) | Create filter status = 'normal'; query returns only matching rows |
✅ Only rows matching the filter expression are returned |
| LF-Tag | Create tag sensitivity: public/internal/confidential; assign to table |
✅ Tag created, assigned, and queryable via Lake Formation console |
Governance implication for regulated workloads: Lake Formation on FSx for ONTAP S3 AP data provides the same fine-grained access control as on native S3 data. Column masking, row filtering, and tag-based classification all work without data movement. This is the strongest AWS-native governance path for FSx for ONTAP data.
Iceberg + Lake Formation path: Glue Data Catalog supports Iceberg table registration natively. For transactional workloads requiring ACID guarantees: sync FSx for ONTAP data to S3 via DataSync → write as Iceberg table (EMR Spark) → register in Glue Catalog → query via Redshift Spectrum with full Lake Formation governance (column/row/tag). This provides the best of both worlds: FSx for ONTAP as source of truth + Iceberg ACID + Lake Formation governance.
Enterprise governance use cases:
-
Healthcare: Column-level masking of PHI fields (e.g., hide
patient_namefrom analysts) - Finance: Row-level filtering by business unit (each team sees only their data)
- Public sector: LF-Tag classification enforcement (sensitivity: public/internal/confidential)
Comparison with Other Engines in This Series
| Aspect | Redshift Spectrum | Athena (Part 1) | DuckDB Lambda (Part 4) | EMR Spark (Part 5) |
|---|---|---|---|---|
| Query latency (5M rows) | 4,277ms | 2,196ms | N/A (memory limit) | 6,780ms |
| DWH JOINs with local tables | ✅ Best | ❌ | ❌ | ❌ |
| Lake Formation governance | ✅ | ✅ | ❌ | ⚠️ Optional |
| Materialized views | ✅ | ❌ | ❌ | ❌ |
| Stored procedures | ✅ | ❌ | ❌ | ❌ |
| Zero idle cost | ✅ (Serverless) | ✅ | ✅ | ✅ |
| Write-back to FSxN | ❌ (results stay in Redshift) | ✅ CTAS | ✅ COPY TO | ✅ Best |
| Cold start | ~3s (Serverless) | ~2s | 1.9s | 20s |
| Cost model | RPU-seconds | $/TB scanned | $/invocation | $/job |
Partner Decision Card
| Customer requirement | Redshift Spectrum + LF today | Recommended path |
|---|---|---|
| JOIN NAS data with DWH tables | ✅ Best fit | Redshift Spectrum external schema |
| Enterprise governance (table/column/tag) | ✅ Best fit | Add Lake Formation |
| Existing Redshift investment | ✅ Natural extension | Add external schema to existing cluster |
| Serverless SQL only (no DWH) | ⚠️ Overkill | Use Athena (faster, cheaper for simple queries) |
| Write-back to FSxN | ❌ Not supported | Use EMR Serverless (Part 5) |
| Sub-second latency | ❌ Cold start overhead | Use DuckDB Lambda (Part 4) |
| Cross-account data sharing | ✅ Lake Formation sharing | Configure LF cross-account grants |
| Column-level masking for compliance | ✅ Lake Formation | Configure column-level permissions |
Discovery Questions for Partners
When a customer asks about Redshift Spectrum + Lake Formation + FSx for ONTAP S3 AP:
- Does the customer already have a Redshift cluster or Serverless workgroup? (If yes, adding Spectrum is trivial)
- Do they need to JOIN NAS data with existing DWH tables? (This is Redshift Spectrum's unique value)
- Is table/column-level governance required? (Lake Formation adds this layer)
- Is the workload read-only analytics, or does it need write-back? (Spectrum is read-only from external data)
- What is the query frequency? (For < 10 queries/day, Athena is cheaper)
- Is cross-account data sharing needed? (Lake Formation simplifies this)
- Are there compliance requirements for column-level masking? (Lake Formation provides this)
- What is the acceptable query latency? (Redshift Serverless has ~3s cold start)
Governance Impact Summary
| Access path | Authorization layers | Auditability | Production suitability |
|---|---|---|---|
| Redshift Spectrum (no LF) | IAM + S3 AP + File System (3 layers) | Medium (CloudTrail) | Good for non-regulated workloads |
| Redshift Spectrum + Lake Formation | LF + IAM + S3 AP + File System (4 layers) | High (LF audit + CloudTrail) | Recommended for regulated workloads |
| Athena + Lake Formation | LF + IAM + S3 AP + File System (4 layers) | High (LF audit + CloudTrail) | Recommended for serverless regulated workloads |
Key insight: Redshift Spectrum and Athena share the same Glue Catalog and Lake Formation permissions. Governance configured for one automatically applies to the other. This means you can use EMR Spark for write-back, register output in Glue, apply Lake Formation permissions, and query from both Athena and Redshift Spectrum with the same governance.
AI Readiness Score
| Pattern | Governance | Performance | AI Capability | Cost | Operational Simplicity | Overall |
|---|---|---|---|---|---|---|
| Redshift Spectrum + LF | ★★★★★ | ★★★☆☆ | ★★☆☆☆ | ★★★☆☆ | ★★★☆☆ | 3.2 |
| Athena + Lake Formation | ★★★★★ | ★★★☆☆ | ★★☆☆☆ | ★★★★☆ | ★★★★☆ | 3.6 |
| Snowflake External Table | ★★★★☆ | ★★☆☆☆ | ★★★★☆ | ★★★☆☆ | ★★★★☆ | 3.4 |
| DuckDB Lambda | ★☆☆☆☆ | ★★★★☆ | ★☆☆☆☆ | ★★★★★ | ★★★★★ | 3.2 |
| EMR Serverless Spark | ★★☆☆☆ | ★★★★☆ | ★★★☆☆ | ★★★☆☆ | ★★★☆☆ | 3.0 |
Scoring methodology: Redshift Spectrum + LF scores highest on Governance (same as Athena + LF) but lower on Cost and Simplicity due to RPU pricing and DWH management overhead. Choose Redshift Spectrum when DWH JOINs are required; choose Athena when serverless SQL is sufficient.
Cost Analysis
| Component | Cost |
|---|---|
| Redshift Serverless (8 RPU, per query) | ~$0.36/RPU-hour (billed per second) |
| Redshift Serverless (idle) | $0 (scales to zero) |
| Lake Formation | $0 (no additional charge) |
| Glue Catalog | $1/100K objects/month |
| FSx for ONTAP (existing) | $0 incremental |
Monthly estimate (100 queries/day, avg 5s each):
- 100 queries × 5s × 8 RPU × $0.36/RPU-hour ÷ 3600 = ~$0.40/day = ~$12/month
Compare with:
- Athena (same queries): ~$5/TB × data scanned
- DuckDB Lambda: ~$1.10/month (but no DWH JOINs)
When Redshift Spectrum is cost-justified: When you already have Redshift and need to JOIN NAS data with local tables. The marginal cost of adding Spectrum queries is low.
When to Use (and When Not To)
Use Redshift Spectrum + Lake Formation when:
- Customer already has Redshift (adding Spectrum is trivial)
- Need to JOIN NAS data with DWH tables
- Enterprise governance (table/column/tag) is required
- Cross-account data sharing is needed
- Compliance requires column-level masking
Don't use when:
- Simple serverless SQL is sufficient (use Athena — faster, cheaper)
- Need write-back to FSxN (use EMR Serverless)
- Need sub-second latency (use DuckDB Lambda)
- No existing Redshift investment (Athena is simpler to start)
- Dataset is small and ad-hoc (DuckDB Lambda is cheapest)
Known Failure Signatures
| Symptom | Likely cause | Next step |
|---|---|---|
permission denied for schema |
IAM role not associated with Redshift | Associate IAM role with Redshift namespace |
S3 access denied on external table |
IAM role missing S3 AP permissions | Add S3 AP ARN to role policy |
| External schema creation fails | Glue database doesn't exist | Create database in Glue Catalog first (or use Athena) |
| Query returns 0 rows | Table location doesn't match S3 AP path | Verify Glue table LOCATION uses AP alias |
Spectrum is not supported |
Using provisioned cluster without Spectrum | Enable Spectrum or use Serverless |
| Lake Formation permission denied | LF permissions not granted | Grant SELECT via aws lakeformation grant-permissions
|
What's Next
- Part 7: Table Format Boundaries — why Delta, Iceberg, and Hudi can't write to FSx S3 AP, and what flat Parquet patterns work instead (critical knowledge for architecture decisions)
Previously in this series:
- Part 1: Athena — Query NAS Data In Place
- Part 2: Databricks — A Layer-by-Layer Validation of Observed Boundaries
- Part 3: Snowflake — From 'Access Denied' to Working External Tables
- Part 4: DuckDB Lambda — Serverless Analytics for $0.00001/Query
- Part 5: EMR Spark — Read-Write ETL on NAS Data
References
- Redshift Spectrum documentation
- AWS Lake Formation documentation
- FSx for ONTAP S3 Access Points
- Redshift Serverless documentation
- GitHub: fsxn-lakehouse-integrations
Key achievement: This validation established that Redshift Spectrum + Lake Formation provides the strongest enterprise governance path for FSx for ONTAP S3 AP data — 4-layer authorization (Lake Formation → IAM → S3 AP → File System), table/column-level access control, and seamless sharing of Glue Catalog with Athena. The same governance configuration applies to both Athena and Redshift Spectrum queries, enabling a unified governance model across query engines.
All benchmarks are from a specific test environment (Redshift Serverless 8 RPU, FSx for ONTAP Single-AZ 128 MB/s, ap-northeast-1). Performance improves with warm queries and provisioned clusters.
Disclaimer: This article is an independent validation report and does not represent AWS or NetApp official guidance. Product behavior and platform capabilities may change. Always validate in your own environment.

Top comments (0)