Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on Jun 8 • Edited on Jun 15

Governance & Cross-Platform Access: Lake Formation, PII Anonymization, and Multi-Engine Reality for S3 Tables

#aws #iceberg #s3tables #amazonfsxfornetappontap

Previously...

In Part 1, we built the metadata catalog. In Part 2, we added AI classification and vector search. Now we need to answer the hard questions:

Who can see what? (governance)
What about PII? (anonymization)
Can Databricks/Snowflake access this? (cross-platform)

Lake Formation: Governance on Unstructured Data

The Problem

Unstructured data on NAS storage may be well protected at the file-system layer, but it is often not consistently classified, searchable, or governed from analytics and AI workflows:

No unified classification → you may not know what's sensitive across the entire corpus
File-system permissions exist, but analytics/AI tools can't leverage them for discovery
Audit trails may exist at the file-system layer, but they are often not unified with analytics and AI query activity

The Solution

With metadata in S3 Tables (Iceberg), Lake Formation provides:

┌───────────────────────────────────────────────────┐
│  Lake Formation                                   │
│                                                   │
│  Table-level:  SELECT, DESCRIBE                   │
│  Column exposure: controlled via Athena Views     │
│                   (hide embedding_vector, paths)  │
│  Row filtering: WHERE sensitivity_level = 'public'│
│  Audit:        CloudTrail logs metadata queries   │
└───────────────────────────────────────────────────┘

Verified: Access Control in Action

Step 1: Authorized user queries metadata
  → ✅ SUCCEEDED (3 rows returned)

Step 2: Revoke SELECT permission
  → 🔒 BLOCKED: "Column 'file_name' cannot be resolved
     or requester is not authorized"

Step 3: Restore permission
  → ✅ SUCCEEDED (access restored)

Step 4: CloudTrail audit
  → All queries logged with user identity and timestamp

Every query against the metadata table is governed and audited. This gives you 100% metadata query governance coverage in this PoC. Raw file access remains governed separately by FSx for ONTAP file-system permissions, S3 Access Point policies, and application-specific access paths.

Lake Formation Governance Status

Capability	Status	Notes
Table-level SELECT / DESCRIBE	✅ Verified	Grant/revoke works correctly
Athena query governance	✅ Verified	Unauthorized access blocked
CloudTrail audit logging	✅ Verified	All queries logged with user identity
Column-level exclusion (ColumnWildcard)	⚠️ Failed	On tested S3 Tables federated catalog path
Row-level filtering / LF-Tags	📋 Design pattern	Taxonomy defined, needs validation
Column exposure via Athena Views	✅ Workaround	Recommended alternative to column-level grants

Observed Limitation: Column-Level Grants on This S3 Tables Federated Catalog Path

In this PoC, table-level Lake Formation SELECT grants worked as expected. However, column exclusion grants using ColumnWildcard with ExcludedColumnNames returned InvalidInputException: Permissions modification is invalid against the s3tablescatalog/... federated catalog path we tested.

AWS documentation describes table, column, and row-level permissions for S3 Tables integrated with Lake Formation. Therefore, treat this as an observed limitation in our specific validation path (CLI command, region, catalog ID, engine version), not a confirmed general product limitation. The exact error and test conditions are recorded in the verification evidence.

Workaround: Create Athena Views that expose only permitted columns:

-- View for general users (no embeddings, no PII paths)
CREATE VIEW metadata.public_files AS
SELECT file_id, file_name, file_type, classification, confidence_score
FROM "s3tablescatalog/fsxn-metadata-catalog"."metadata"."unstructured_files"
WHERE is_deleted = false AND sensitivity_level = 'public';

-- Apply Lake Formation on the view
-- Users query the view, not the base table

Governance model choice: For simple use cases, table/column-level permissions suffice. For dynamic, attribute-based access (e.g., "only files classified as 'public'"), use LF-Tags. For enterprise SSO integration, combine with IAM Identity Center. For enterprise governance, map sensitivity_level, path_classification, tenant_id, and pii_status to LF-Tags. See governance/lf-tag-taxonomy.yaml.

Untested alternative: Registering the S3 Tables table in a standard (non-federated) Glue Catalog may enable column-level permissions. This requires manual Iceberg metadata location configuration and has not been verified.

PII Detection: English + Japanese

The Challenge

Amazon Comprehend's detect_pii_entities API supports only English and Spanish. For Japanese PII (names, addresses, My Number), we need a different approach.

Dual-Engine Architecture

Language	Engine	Detectable PII	Latency	Cost
English	Amazon Comprehend	NAME, EMAIL, PHONE, ADDRESS, SSN, CREDIT_CARD, DATE_TIME	~200ms	$0.0001/100 chars
Japanese	Bedrock Claude	氏名, メール, 電話, 住所, マイナンバー, クレジットカード, 生年月日	~2-5s	~$0.003/request

Data privacy note: When using Bedrock Claude for PII detection, document text is sent to the Bedrock API. Per AWS's data privacy policy, Bedrock does not store or use your inputs/outputs to train models. For highly sensitive workloads, consider VPC endpoints and AWS PrivateLink for Bedrock access.

Japanese PII Detection (Verified)

# Bedrock Claude detects Japanese PII via prompt
response = bedrock.invoke_model(
    modelId="anthropic.claude-3-haiku-20240307-v1:0",
    body=json.dumps({
        "messages": [{"role": "user", "content":
            f"Detect all PII in this text. Return JSON array: "
            f'[{{"type":"...","value":"...","begin":N,"end":N}}]\n\n'
            f"Text:\n{japanese_text}"}]
    })
)

Results on a controlled synthetic sample (not real personal data):

PII Type	Detected Value
NAME	山田太郎
EMAIL	taro.yamada@example.co.jp
PHONE	090-1234-5678
ADDRESS	〒150-0002 東京都渋谷区渋谷1-2-3
MY_NUMBER	1234 5678 9012
CREDIT_CARD	4111-1111-1111-1111
DATE_OF_BIRTH	1985年3月15日

Anonymization Pipeline

Original document
       │
       ▼
PII Detection (Comprehend or Bedrock)
       │
       ├─ No PII → has_pii = false (no action needed)
       │
       └─ PII found → has_pii = true
                          │
                          ▼
              Redaction: all PII → [REDACTED]
                          │
                          ▼
              Store anonymized version
              anonymization_status = "completed"

Before:

Name: Taro Yamada
Email: taro.yamada@example.com
Phone: 090-1234-5678
SSN: 123-45-6789

After:

Name: [REDACTED]
Email: [REDACTED]
Phone: [REDACTED]
SSN: [REDACTED]

Data Clean Room Pattern

┌─────────────────────────────────────────┐
│  Restricted Table (full metadata)       │
│  • has_pii, anonymized_path, raw paths  │
│  • Access: Security team only           │
│  • Lake Formation: strict SELECT grant  │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Public Table (anonymized metadata)     │
│  • classification, summary (redacted)   │
│  • No PII, no raw file paths            │
│  • Access: All analysts                 │
│  • Lake Formation: broad SELECT grant   │
└─────────────────────────────────────────┘

Encryption and Data Residency

At rest: S3 Tables uses SSE-S3 encryption by default. All metadata is encrypted.
In transit: All API calls use TLS 1.2+.
Data residency: Both metadata (S3 Tables) and raw files (FSx for ONTAP) remain in the same AWS region. No cross-border data transfer occurs in the default architecture.

For detailed data sovereignty analysis, see the Architecture Document — Data Sovereignty section.

Audit Log Retention

CloudTrail: Default 90-day event history. For long-term retention, create a Trail delivering to S3 (recommended: 1+ year for regulated industries)
Lake Formation: Data access audit logs are recorded via CloudTrail
OpenSearch: Access logs can be delivered to CloudWatch Logs
Analysis: Use CloudTrail Lake (SQL queries) or Athena + S3 (cost-efficient) for audit analysis

For detailed operational monitoring setup, see the Operational Monitoring section in the architecture document.

Path Sensitivity Model

File paths can reveal sensitive context even when file contents are not exposed (e.g., /hr/layoffs/2026/ or /legal/mna/target-company/).

Recommended controls:

Store raw_path only in the restricted metadata table
Expose hashed_path or anonymized_path to general users
Use path_classification: public / internal / restricted / confidential
Apply Lake Formation grants to curated views, not the base table

Raw Data Access Boundary

This architecture governs metadata access through S3 Tables and Lake Formation. It does not automatically replace:

ONTAP/NFS/SMB file-system permissions
S3 Access Point resource policies
IAM permissions for raw file reads
Application-level authorization
Downstream use of presigned URLs or copied files

Treat metadata governance and raw data governance as two linked but separate control planes. Both must be configured for end-to-end security.

S3 Access Point Identity Boundary

Each FSx for ONTAP S3 Access Point has an associated file-system identity (OntapFileSystemIdentity — UNIX UID/GID or Windows domain user). All file access through that AP is authorized as that identity.

For each access point, document:

IAM principals allowed to use the access point
Access point policy (allowed S3 actions)
Associated UNIX or Windows file-system identity
Allowed volume / prefix scope
Whether the identity can access files beyond what metadata governance intends
Audit evidence location

If the AI enrichment access point uses a broad UNIX identity (e.g., root or a service account with wide read access), metadata-level Lake Formation controls do not prevent raw file reads through that AP. Scope the AP identity to minimum required access.

See security/s3-access-point-identity-matrix.yaml for the template.

Permission Identity Strategy

For multiprotocol environments (NFS + SMB + S3 AP):

Record discovery_protocol: nfs / smb / s3ap
Record access_point_identity_type: unix / windows
Record effective_reader_identity
Record permission_source: nfs_mode / ntfs_acl / mixed
Do not assume metadata visibility implies raw file readability

Retention and Deletion Semantics

This PoC uses metadata records to represent file discovery and enrichment state. For regulated workloads, define:

Metadata retention period (how long to keep catalog records)
Raw file retention period (governed by storage policy, not this catalog)
Anonymized metadata retention period
Deletion request workflow (who can request, who approves, how it's executed)
Snapshot expiration impact on deletion (Iceberg time travel may expose deleted metadata until snapshots expire)
Audit evidence retention (keep deletion evidence longer than the data itself)

Important: Iceberg time travel is useful for recovery, but it means deleted metadata may still be queryable during the snapshot retention window. Align snapshot expiration with your data deletion SLA.

Snowflake-side retention: If redacted metadata is synced into Snowflake-managed tables, define Snowflake-side retention, Time Travel (default 1 day, up to 90 days), and Fail-safe (7 days, non-configurable) separately from Iceberg snapshot retention. Deletion from the Snowflake copy does not delete from the Iceberg source, and vice versa.

Approval Evidence Template (for Regulated Industries)

For organizations requiring formal access approval documentation:

Approval ID: <unique-id>
Data owner: <name/group>
Security owner: <name/group>
Platform owner: <name/group>
Allowed metadata columns: <columns>
Allowed raw file prefixes: <prefixes>
Allowed operations: metadata query only / raw file read / anonymized export
Review date: <date>
Expiration date: <date>
Evidence location: verification-evidence/<path>

Regulated Workload Readiness

For public sector, healthcare, financial services, and other regulated industries, validate the following before production deployment:

Area	Requirement	Status in this PoC
Data residency	Metadata and raw files in same AWS Region	✅ Single region (ap-northeast-1)
Encryption at rest	S3 Tables: SSE-S3; FSx: at-rest encryption	✅ Default encryption
Encryption in transit	TLS 1.2+ for all API calls	✅ AWS default
Raw data access boundary	File reads governed by S3 AP policy + ONTAP permissions	✅ Documented
Metadata access boundary	Lake Formation table-level + CloudTrail audit	✅ Verified
AI processing data flow	Content sent to Bedrock API, not stored by provider	✅ Per AWS data protection policy
PII detection limitations	English (Comprehend) + Japanese (Claude) only	⚠️ Other languages not covered
Human review workflow	Low-confidence queue defined	✅ Design documented
Audit log retention	CloudTrail 90-day default; configure Trail for longer	⚠️ Requires Trail setup
Deletion SLA	Define separately for metadata, raw files, and snapshots	⚠️ Requires policy definition
Legal/compliance sign-off	Not in scope for this PoC	❌ Required before production

AI governance note: AI enrichment in this pattern is assistive metadata generation. It does not constitute authoritative regulatory classification. Final classification decisions, data handling approvals, and compliance certifications must be confirmed by data owners, security teams, legal counsel, and compliance officers.

Cross-Platform Access: The Current Reality

Fully Verified ✅

Platform	Access Method	Status
Athena	Direct query via Glue federated catalog	✅ Fully verified
Lambda/Python	PyIceberg SDK	✅ Fully verified
EMR Spark	Glue Iceberg REST (EMR 7.13.0+)	✅ Fully verified (SELECT, COUNT, time travel)
Snowflake	Glue Iceberg REST + VENDED_CREDENTIALS	✅ Fully verified (CREATE TABLE, SELECT, COUNT, DESCRIBE, AUTO_REFRESH)
Snowflake	External Stage (FSx S3 AP) + TO_FILE + Cortex AI	✅ Fully verified

Expected / Requires Validation ⚠️

Platform	Access Method	Status
EMR Trino	Glue Iceberg REST (EMR 7.13.0+)	⚠️ Expected (same EMR SigV4 handling as Spark)
Redshift Spectrum	Same as Athena (Glue catalog)	⚠️ Expected, not fully validated

What Doesn't Work (Yet) ⚠️

Platform	Tested method	Result	Tested	Status
Databricks SQL Warehouse	`CREATE CONNECTION TYPE iceberg_rest` to S3 Tables REST	`CONNECTION_TYPE_NOT_SUPPORTED`	2026-05-31	Observed limitation in this path
Databricks Spark cluster	Iceberg REST + SigV4 via spark.conf.set / cluster config	`NO_SUCH_CATALOG_EXCEPTION` (UC blocks external catalog registration)	2026-06-01	Confirmed: UC Foreign Catalog required
Databricks Delta Sharing	Delta Sharing server accessing S3 AP-backed storage	Sharing server uses same UC storage credentials; cannot bypass session policy	2026-06-01	Confirmed limitation (not a workaround for S3 AP)
Databricks NFS → UC Volume	NFS mount path as UC External Volume	Cloud storage URIs only (s3://, abfss://, gs://); NFS/FUSE paths not supported	2026-06-01	Confirmed limitation; internal feature request exists
Snowflake	External Iceberg Table with S3 Tables direct REST endpoint	Not a supported catalog type (use Glue REST instead)	2026-05-31	Use Glue REST + VENDED_CREDENTIALS (✅ verified)
Snowflake	CATALOG INTEGRATION with default ACCESS_DELEGATION_MODE	Defaults to EXTERNAL_VOLUME_CREDENTIALS which triggers ListObjectsV2 (rejected by S3 Tables)	2026-06-02	✅ Resolved: set explicit `ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS`
Snowflake	Lake Formation column-level via VENDED_CREDENTIALS	AllowFullTableExternalDataAccess=false blocks all VENDED_CREDENTIALS access	2026-06-08	Use Snowflake Horizon (Row Access Policy / Dynamic Masking) for column governance
Snowflake Open Catalog	Polaris as Iceberg catalog	Not tested	TBD	Strategic alternative

Databricks: Three Integration Paths to Validate

Update note (2026-06-09): We revalidated the S3 Tables path after Databricks announced GA for Foreign Iceberg and credential vending (May 28, 2026). Glue Connection creation and credential configuration succeeded, but Unity Catalog External Location validation still failed because S3 Tables internal buckets reject standard S3 API validation (HeadBucket/ListBucket). The S3 Tables path remains blocked in this tested Databricks UC configuration. A new Databricks support case has been submitted.

For Databricks business users: The value is not only table access. The value is turning previously invisible NAS files into governed metadata assets that can be searched, explained, lineage-tracked, and consumed from Databricks SQL, AI/BI, and dashboards.

In this PoC, CREATE CONNECTION TYPE iceberg_rest to the S3 Tables REST endpoint returned CONNECTION_TYPE_NOT_SUPPORTED on Databricks SQL Warehouse (tested 2026-05-31). This does not mean Databricks lacks Iceberg REST support — Databricks provides Unity Catalog Iceberg REST endpoints and Foreign Iceberg capabilities that evolve rapidly.

Confirmed Limitations (2026-06-01)

Path	Result	Confirmed by
Spark cluster + Iceberg REST (spark.conf.set / cluster config)	❌ UC blocks external catalog registration	Databricks support + our testing
Delta Sharing via S3 Access Point	❌ Sharing server uses same UC storage credentials	Databricks support
NFS mount path as UC External Volume	❌ Cloud storage URIs only (s3://, abfss://, gs://)	Databricks support
DataSync → S3 → UC External Delta Table → Delta Sharing	✅ Works (Delta format required)	Databricks support

Delta Sharing note: Delta Sharing is not a workaround for the FSx S3 Access Point session policy limitation in our tested path. The sharing server uses the same UC storage credentials and cannot bypass the session policy that blocks S3 AP ARNs. Note that Databricks has announced first-class Iceberg format support in Delta Sharing (Jan 2026), enabling providers to share Iceberg tables via the Iceberg REST Catalog API. This broader capability is not contradicted by our finding — our limitation is specific to S3 AP-backed storage access through UC credentials, not Delta Sharing's format support in general.

NFS Volume note: UC External Volumes require cloud storage URIs. An internal feature request (AHA) exists for EFS/NFS access via UC. Until this is implemented, DataSync → S3 → UC External Location remains the only supported path.

📢 Databricks users: If S3 Tables access from Databricks is important for your workflow, the UC Foreign Catalog for S3 Tables feature is being tracked internally by Databricks (request DB-I-15824). Contact your Databricks account team to express interest and increase prioritization. Snowflake achieved full S3 Tables access via VENDED_CREDENTIALS in June 2026 — the same architectural pattern should be feasible for UC.

Immediate workaround for Databricks: Use DataSync → S3 → UC External Table to sync metadata into a standard S3 location accessible by Unity Catalog. This is not zero-copy for the synced metadata, but raw files remain on FSx for ONTAP.

Path 1: Spark cluster + Iceberg REST (SigV4)

Best for technical validation and batch processing. Two endpoint options:

Tested 2026-06-01: On Databricks with Unity Catalog enabled, external Iceberg catalogs cannot be registered via spark.conf.set or cluster Spark config. Unity Catalog controls catalog registration exclusively. Both Serverless (CONFIG_NOT_AVAILABLE) and All-Purpose clusters (NO_SUCH_CATALOG_EXCEPTION) fail. Unity Catalog Foreign Catalog (Path 2) is the required approach.

# Path 1a: Direct S3 Tables REST endpoint (used in this PoC)
spark.sql.catalog.s3tables.uri=https://s3tables.ap-northeast-1.amazonaws.com/iceberg
spark.sql.catalog.s3tables.rest.signing-name=s3tables

# Path 1b: AWS Glue Iceberg REST endpoint (recommended for production + Lake Formation)
spark.sql.catalog.s3tables.uri=https://glue.ap-northeast-1.amazonaws.com/iceberg
spark.sql.catalog.s3tables.rest.signing-name=glue

Common config for both:

spark.sql.catalog.s3tables=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.s3tables.catalog-impl=org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.s3tables.warehouse=arn:aws:s3tables:ap-northeast-1:<ACCOUNT>:bucket/fsxn-metadata-catalog
spark.sql.catalog.s3tables.rest.sigv4-enabled=true
spark.sql.catalog.s3tables.rest.signing-region=ap-northeast-1

Path 2: Unity Catalog Foreign Iceberg

Register external Iceberg tables into Unity Catalog if supported for the target catalog/storage path. Best for Databricks governance, lineage, and discovery. Verify refresh semantics and read/write limitations. Retested for S3 Tables on 2026-06-09: Glue Connection and credentials succeeded, but UC External Location validation failed because S3 Tables internal buckets reject standard S3 API validation. This path remains blocked for the tested S3 Tables configuration.

Documentation/version note: Databricks Iceberg capabilities are evolving rapidly. Earlier documentation and our initial validation showed limitations around Foreign Iceberg credential vending and automatic refresh behavior. After the May 2026 GA announcement, we revalidated the S3 Tables path on 2026-06-09. Credential configuration progressed further, but the tested path still failed at UC External Location validation against S3 Tables internal storage. Additionally, Databricks supports catalog federation with AWS Glue (Hive Metastore type), which can expose Glue-registered tables in UC. Whether a future Iceberg REST catalog federation path could bypass the S3 Tables internal bucket constraint is an open question.

Refresh semantics: If UC Foreign Iceberg works for S3 Tables via Glue REST, define refresh semantics explicitly. Our metadata catalog is append-only (new records added on file events). Analysts should know whether Databricks reads the latest Iceberg snapshot automatically or only after REFRESH FOREIGN TABLE. Without auto-refresh, Athena and Databricks may show temporarily different results until the next refresh cycle. Plan for a scheduled refresh job or event-driven trigger.

AWS reference for this path: AWS has published guidance on accessing S3 Iceberg tables from Databricks using the Glue Iceberg REST Catalog. This validates the architectural direction of B-4/B-5, though S3 Tables-specific compatibility requires separate validation.

Path 3: AWS Glue Catalog Federation with Databricks

AWS Glue can federate metadata from Databricks Unity Catalog for Iceberg tables. This is the reverse direction but useful for cross-platform governance patterns.

Federation Directionality

Pattern	Direction	Primary governance	Best for
UC Foreign Catalog / Catalog Federation to Glue	Databricks reads AWS-managed metadata	Unity Catalog	Databricks users querying AWS Iceberg (S3 Tables)
AWS Glue federation to UC	AWS reads Databricks-managed metadata	Lake Formation / Glue	Athena/EMR/Redshift reading UC Iceberg/UniForm

AWS reference: AWS has published guidance on accessing S3 Iceberg tables from Databricks using AWS Glue Iceberg REST Catalog, and on federating Databricks Unity Catalog data into AWS Glue Data Catalog. Both directions are documented.

Why Iceberg here (not Delta Lake)? This architecture uses Iceberg because S3 Tables is Iceberg-native, and the Iceberg REST endpoint enables multi-engine access (Athena, EMR, Snowflake). For Databricks-only environments, Delta Lake on S3 remains the natural choice. This pattern targets multi-platform scenarios.

Databricks UC Audit Logging for External Engines (Confirmed 2026-06-01)

External engine access via the UC Iceberg REST Catalog endpoint is fully auditable:

Audit aspect	Confirmed behavior
Metadata requests (listNamespaces, listTables, loadTable)	✅ Logged in `system.access.audit` under `uniformIcebergRestCatalog`
Vended credential issuance	✅ Logged as `loadTableCredentials` / `generateTemporaryTableCredential`
Audit fields	user_identity, source_ip_address, user_agent, event_time, action_name, request_params
Distinguish external vs internal	✅ `service_name = 'uniformIcebergRestCatalog'` (external) vs `'unityCatalog'` (internal)

Note: Databricks audit logs record credential issuance, not individual S3 file reads after credentials are vended. Complement with AWS CloudTrail + S3 access logging for file-level audit.

Databricks integration documentation:

For UC Foreign Catalog validation steps, see databricks/uc-foreign-iceberg-validation.md

For coexistence planning, see databricks/coexistence-roadmap.md

For audit investigation, see databricks/audit-correlation-guide.md

Snowflake: S3 Tables via Glue REST + VENDED_CREDENTIALS ✅

Working Configuration (Verified 2026-06-05)

Snowflake can directly query S3 Tables Iceberg tables via the Glue Iceberg REST endpoint with VENDED_CREDENTIALS. Here's the complete working setup:

-- 1. Catalog Integration (CRITICAL: explicit ACCESS_DELEGATION_MODE)
CREATE OR REPLACE CATALOG INTEGRATION s3tables_glue_rest_int
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = 'metadata'
  REST_CONFIG = (
    CATALOG_URI = 'https://glue.ap-northeast-1.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    CATALOG_NAME = '<ACCOUNT_ID>:s3tablescatalog/fsxn-metadata-catalog'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS  -- MUST be explicit
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::<ACCOUNT_ID>:role/fsxn-snowflake-verification-role'
    SIGV4_SIGNING_REGION = 'ap-northeast-1'
  )
  ENABLED = TRUE;

-- 2. Schema WITHOUT default EXTERNAL_VOLUME (critical)
CREATE SCHEMA FSXN_LAKEHOUSE.S3TABLES_VENDED;
USE SCHEMA FSXN_LAKEHOUSE.S3TABLES_VENDED;

-- 3. Table WITHOUT EXTERNAL_VOLUME parameter (critical)
CREATE ICEBERG TABLE s3tables_vended_creds_test
  CATALOG = 's3tables_glue_rest_int'
  CATALOG_TABLE_NAME = 'unstructured_files';

AWS prerequisites (must be completed before Snowflake configuration):

# Register S3 Tables resource with Lake Formation (--with-federation is REQUIRED)
aws lakeformation register-resource \
  --resource-arn "arn:aws:s3tables:ap-northeast-1:<ACCOUNT_ID>:bucket/fsxn-metadata-catalog" \
  --role-arn "arn:aws:iam::<ACCOUNT_ID>:role/S3TablesRoleForLakeFormation" \
  --with-federation \
  --region ap-northeast-1

# Grant SELECT + DESCRIBE to Snowflake's IAM role (table-level)
aws lakeformation grant-permissions \
  --principal '{"DataLakePrincipalIdentifier":"arn:aws:iam::<ACCOUNT_ID>:role/fsxn-snowflake-verification-role"}' \
  --resource '{"Table":{"CatalogId":"<ACCOUNT_ID>:s3tablescatalog/fsxn-metadata-catalog","DatabaseName":"metadata","Name":"unstructured_files"}}' \
  --permissions SELECT DESCRIBE \
  --region ap-northeast-1

IAM role policy must include: glue:GetTable, glue:GetDatabase, glue:GetCatalog, lakeformation:GetDataAccess, s3tables:GetTableBucket, s3tables:GetTable, s3tables:GetNamespace, s3tables:GetTableData, s3tables:GetTableMetadataLocation.

Verify integration (expected DESCRIBE output after creation):

DESCRIBE CATALOG INTEGRATION s3tables_glue_rest_int;

Property	Value
ENABLED	true
CATALOG_SOURCE	ICEBERG_REST
TABLE_FORMAT	ICEBERG
CATALOG_NAMESPACE	metadata
REST_CONFIG	{CATALOG_URI=https://glue.ap-northeast-1.amazonaws.com/iceberg, ...}
REST_AUTHENTICATION	{TYPE=SIGV4, SIGV4_IAM_ROLE=arn:aws:iam::<ACCOUNT_ID>:role/..., ...}
API_AWS_IAM_USER_ARN	arn:aws:iam::465774455528:user/<snowflake-user-id>
API_AWS_EXTERNAL_ID	<external-id-for-trust-policy>
REFRESH_INTERVAL_SECONDS	30

Setup step: Copy API_AWS_IAM_USER_ARN and API_AWS_EXTERNAL_ID from this output into your IAM role's trust policy to allow Snowflake to assume the role.

Verified Capabilities (2026-06-08)

Operation	Result	Performance
CREATE ICEBERG TABLE	✅	6.5s
SELECT * LIMIT 5	✅ (5 rows)	1.9s
COUNT(*)	✅ (170 rows)	66ms
DESCRIBE TABLE	✅ (23 columns)	69ms
ALTER ... SET AUTO_REFRESH = TRUE	✅	131ms
SHOW ICEBERG TABLES	✅ (UNMANAGED type)	567ms
Time Travel	✅ (available, snapshot-dependent)	—

Screenshots (URL bar excluded, S3 Tables internal bucket masked):

COUNT() returns 170 rows in 66ms*

DESCRIBE TABLE shows all 23 Iceberg columns

SHOW ICEBERG TABLES confirms UNMANAGED type with S3TABLES_GLUE_REST catalog

SELECT * LIMIT 5 returns actual file metadata from S3 Tables

AUTO_REFRESH + Time Travel verification:

AUTO_REFRESH verified: PyIceberg appended 1 record → Snowflake COUNT() automatically updated from 170 to 171 within 30 seconds*

Time Travel: querying 20 minutes ago returns 170 (before the append), confirming snapshot history is accessible

About FILE_PATH: The FILE_PATH column shows the S3 path used during metadata ingestion (via FSx for ONTAP S3 Access Point). This is the path recorded in the Iceberg metadata catalog — it does not mean the files were copied to S3. The actual files remain on FSx for ONTAP and are accessible via NFS, SMB, or S3 Access Point depending on your application's protocol.

Key Insight: Why Previous Attempts Failed

ACCESS_DELEGATION_MODE defaults to EXTERNAL_VOLUME_CREDENTIALS when not explicitly specified. In this default mode, Snowflake validates storage access through the External Volume path, which triggers ListObjectsV2 against S3 Tables internal buckets — an operation that returns MethodNotAllowed.

With VENDED_CREDENTIALS explicit:

Snowflake calls Glue REST loadTable
Lake Formation (via GetTemporaryGlueTableCredentials) returns temporary storage credentials in the loadTable response config map
Snowflake uses these credentials to access data files directly by exact path
No ListObjectsV2 is required — Snowflake reads files by exact path from Iceberg metadata

Note: The Glue REST endpoint does not implement the standard Iceberg REST /credentials endpoint. Credential vending works through Lake Formation's proprietary mechanism embedded in the loadTable response. This is transparent to Snowflake when configured correctly.

Governance Limitation: Lake Formation Column-Level (2026-06-08)

Lake Formation column-level filtering is NOT enforced via the VENDED_CREDENTIALS path:

When AllowFullTableExternalDataAccess = false, the entire VENDED_CREDENTIALS path is blocked
Explicit column/table-level grants + ExternalDataFilteringAllowList do not resolve this
AllowFullTableExternalDataAccess = true is required for VENDED_CREDENTIALS to function

Technical context: AllowFullTableExternalDataAccess controls whether external engines (those using Lake Formation credential vending) can access table data without per-table SELECT grants. When set to false, fine-grained column/row filtering is the intended enforcement mechanism — but for S3 Tables accessed via VENDED_CREDENTIALS, this currently results in complete access denial rather than filtered access. This may be a service-specific constraint of the S3 Tables federated catalog path, or it may require additional AllowExternalDataFiltering + ExternalDataFilteringAllowList configuration that was not effective in our testing. A feature request has been submitted to AWS.

Workaround: Use Snowflake Horizon for column-level governance:

-- Row Access Policy: restrict by sensitivity_level
CREATE OR REPLACE ROW ACCESS POLICY metadata_sensitivity_filter AS
  (sensitivity_level VARCHAR) RETURNS BOOLEAN ->
    CASE
      WHEN IS_ROLE_IN_SESSION('SECURITY_ADMIN') THEN TRUE
      WHEN sensitivity_level IN ('public', 'internal') THEN TRUE
      ELSE FALSE
    END;

ALTER TABLE s3tables_vended_creds_test ADD ROW ACCESS POLICY
  metadata_sensitivity_filter ON (sensitivity_level);

-- Dynamic Data Masking: hide embedding vectors from non-ML roles
CREATE OR REPLACE MASKING POLICY mask_embedding AS
  (val BINARY) RETURNS BINARY ->
    CASE
      WHEN IS_ROLE_IN_SESSION('ML_ENGINEER') THEN val
      ELSE NULL
    END;

ALTER TABLE s3tables_vended_creds_test MODIFY COLUMN
  embedding_vector SET MASKING POLICY mask_embedding;

Snowflake Iceberg Access Modes (Summary)

Access mode	Best for	Status
Glue REST + VENDED_CREDENTIALS	S3 Tables direct query	✅ VERIFIED
External Stage (FSx S3 AP) + TO_FILE	File AI analysis (Cortex COMPLETE)	✅ VERIFIED
Metadata sync to Snowflake table	BI / Cortex Search / governance	Available
Object Store Catalog	Direct metadata file read	❌ Blocked (S3 Tables internal bucket)
Snowflake Open Catalog (Polaris)	Alternative Iceberg catalog	Not tested

📖 Investigation History (2026-06-01 to 2026-06-05) — click to expand

2026-05-31: Tested S3 Tables direct REST endpoint as External Iceberg catalog → not a supported catalog type.

2026-06-01: Created CATALOG INTEGRATION using ICEBERG_REST + AWS_GLUE + VENDED_CREDENTIALS. DESCRIBE succeeded but CREATE ICEBERG TABLE failed with "Failed to retrieve credentials from the Catalog". Root cause identified: Glue REST does not implement /credentials endpoint (UnknownOperationException).

2026-06-02: AWS Support confirmed Lake Formation uses proprietary mechanism (GetTemporaryGlueTableCredentials) for credential vending, not standard Iceberg REST /credentials. Snowflake Support confirmed Error 004174 occurs when s3.access-key-id/secret/token absent from loadTable response.

2026-06-02: Tested Object Store catalog and EXTERNAL_VOLUME_CREDENTIALS mode — both blocked by S3 Tables internal bucket rejecting ListObjectsV2.

2026-06-03: Discovered register-resource --with-federation was missing. After setup, loadTable response included credentials. However, CREATE TABLE still failed at storage validation (ListObjectsV2).

2026-06-05: Snowflake Support identified the critical distinction: ACCESS_DELEGATION_MODE defaults to EXTERNAL_VOLUME_CREDENTIALS. Explicitly setting VENDED_CREDENTIALS + schema without External Volume + CREATE TABLE without External Volume parameter → SUCCESS. CREATE TABLE + SELECT both working.

2026-06-08: Additional testing confirmed COUNT(*), DESCRIBE, AUTO_REFRESH, SHOW ICEBERG TABLES all working. Lake Formation column-level filtering NOT enforced via this path (AllowFullTableExternalDataAccess=false blocks all access).

External Stage note: Snowflake External Stage against the FSx S3 Access Point alias was verified in this PoC (2026-05-31, ap-northeast-1). Update (2026-06-02): TO_FILE (Cortex COMPLETE multimodal) also verified working — Claude Sonnet 4.5 can directly read files from FSx for ONTAP via S3 AP-backed External Stage. See snowflake/external-stage-fsx-s3ap-validation.md for exact DDL and verified operations.

Snowflake Metadata Activation Pattern

If you sync only the metadata into Snowflake (not raw files), you preserve the zero-copy principle for actual data while enabling Snowflake-native use cases:

Governed metadata analytics and executive dashboards
File inventory and PII coverage reporting
Cortex Search over redacted summaries (RAG on metadata)
Snowflake Intelligence / Cortex Analyst style business Q&A
Row Access Policies and Dynamic Masking on synced metadata

Horizon Catalog note: When metadata reaches Snowflake, Snowflake governance features such as Row Access Policies and Dynamic Masking can be applied to Snowflake-managed access paths. For external engine access via Iceberg REST, validate the exact Open Catalog / Horizon behavior for your target engine and security model.

Metadata sync best practice: Sync curated latest-record metadata, not the append-only base table, unless analysts explicitly need history. Preserve scan_run_id, change_type, and is_deleted for audit and reconciliation. Use MERGE INTO keyed by file_id or path_hash to make metadata activation idempotent. See snowflake/metadata-sync-example.sql for the full pattern.

Governance policy mapping: When syncing metadata into Snowflake, map AWS-side fields such as sensitivity_level, tenant_id, pii_status, and path_classification to Snowflake tags, masking policies, and row access policies. Track policy drift between Lake Formation and Snowflake governance. See snowflake/path-decision-guide.md for the full policy mapping.

Snowflake Cortex Search Activation Pattern

If redacted metadata and summaries are synced into Snowflake, Cortex Search can provide Snowflake-native enterprise search and RAG over metadata — without managing embeddings, infrastructure, or search quality tuning.

Why Cortex Search here:

Business users can search approved metadata without operating a separate vector database
RAG and enterprise search can run over redacted summaries already governed in Snowflake
Search quality, embedding management, and index refresh are delegated to Snowflake-managed services
This is best suited for Snowflake-first organizations that want business-facing discovery inside the AI Data Cloud

Use Cortex Search for:

Executive metadata search (natural language queries over file inventory)
File inventory Q&A (powered by LLM + retrieval)
PII coverage reporting and compliance dashboards
Governed search over redacted summaries

OpenSearch Serverless NextGen remains the AWS-native serving index for this PoC. Cortex Search is an optional Snowflake-native alternative for organizations that standardize on Snowflake for business discovery.

Role separation: S3 Tables / Iceberg remains the metadata source of truth. OpenSearch (AWS path) or Cortex Search (Snowflake path) are serving indexes for search UX. Choose based on your primary platform. Cortex Search operates over redacted summaries and metadata synced into Snowflake, not raw files, unless the customer explicitly chooses to copy/extract document content into Snowflake (which would break the zero-copy raw data principle).

Cortex Search scope: Cortex Search should operate on redacted metadata and summaries by default. If raw document content is extracted or copied into Snowflake for Cortex use cases, treat that as a separate data movement decision with its own governance, retention, and cost model.

Snowflake activation cost drivers: Snowflake activation introduces separate cost drivers from the AWS-native catalog: warehouse compute for metadata sync tasks and dashboards, Cortex Search service usage (based on corpus size and query volume), task/stream orchestration for refresh, and small metadata storage. These costs should be modeled separately from the AWS-native catalog cost ($114/month estimate in Part 1 does not include Snowflake-side compute).

Retention alignment: Confirm Snowflake account edition, table type, and retention settings before promising deletion SLAs. Snowflake Time Travel (1–90 days) and Fail-safe (7 days) operate independently from Iceberg snapshot expiration. Snowflake-side deletion evidence should be retained separately from Iceberg snapshot expiration evidence.

Snowflake Metadata Product Contract

When activating metadata in Snowflake, expose a curated subset as the governed metadata product:

Recommended curated columns:

Column	Purpose	Governance
file_id	Unique identifier	—
business_domain	Organizational grouping	Row access policy
file_type	File format	—
classification	AI-generated classification	—
sensitivity_level	Data sensitivity tier	Snowflake tag + masking policy
pii_status	PII detection result	Access policy / dashboard filter
redacted_summary	AI-generated (PII-free) summary	Cortex Search source column
owner_team	Business ownership	Business glossary / stewardship
last_seen_at	Last scan timestamp	—
data_quality_status	Enrichment quality flag	—

Snowflake governance mapping:

sensitivity_level → Snowflake tag + masking policy
tenant_id / business_domain → row access policy
pii_status → access policy / dashboard filter
redacted_summary → Cortex Search source column
owner_team → business glossary / stewardship workflow

Databricks Metadata Activation Pattern

If UC Foreign Catalog is not yet validated for your S3 Tables path, sync only the redacted metadata into a UC-managed Delta table. This preserves the zero-copy principle for raw files while enabling Databricks-native use cases:

Databricks SQL dashboards and executive reporting
AI/BI Genie over curated metadata (natural language queries)
UC lineage and audit on metadata usage
ML feature generation from file metadata
Operational reporting on PII coverage and enrichment backlog

Raw files remain on FSx for ONTAP. Only the small metadata table (~MB scale for 100K files) is synced.

This is analogous to the Snowflake metadata activation pattern: it copies only curated metadata, not the original unstructured files. Both patterns preserve the zero-copy principle for raw data.

Databricks Raw File Access Decision:

Requirement Recommended path

Governed metadata analytics only UC Foreign Catalog (if validated) or sync metadata to UC Delta

Raw file processing in Databricks DataSync → S3 → UC External Volume

Zero-copy raw file access from Databricks Not supported in validated paths (NFS mount works but without UC governance)

Business discovery / BI Sync redacted metadata to UC Delta table

If metadata is synced into Databricks for BI, include Databricks SQL / Jobs compute cost in the activation model. This does not affect raw-file zero-copy storage, but it is part of the business-facing analytics cost.

Requirement	Recommended path
Governed metadata analytics only	UC Foreign Catalog (if validated) or sync metadata to UC Delta
Raw file processing in Databricks	DataSync → S3 → UC External Volume
Zero-copy raw file access from Databricks	Not supported in validated paths (NFS mount works but without UC governance)
Business discovery / BI	Sync redacted metadata to UC Delta table

Other Lakehouse Engines to Validate

Beyond Databricks and Snowflake, the most natural validation targets for this metadata catalog are:

Engine	Access path	Likely fit	Validation priority
Trino / Starburst	Glue Iceberg REST or S3 Tables REST	Federated SQL, ad hoc query	High
EMR Spark	Glue Iceberg REST (native since EMR 7.5.0+)	Bulk backfill, batch enrichment	High
Redshift Spectrum	Glue catalog (external schema)	DWH integration, BI	Medium
Dremio	Glue catalog or Iceberg REST	Query acceleration, BI	Medium
StarRocks / Doris	Glue Iceberg REST	Low-latency serving queries	Medium
Apache Flink	Glue Iceberg REST	Streaming metadata updates	Low
dbt (via Athena)	dbt-athena + Iceberg materialization	Analytics engineering, governed marts	Medium
Apache NiFi	Iceberg REST or Polaris	Event-driven ingestion	Low

These engines should be validated against:

S3 Tables direct REST vs AWS Glue Iceberg REST
Read vs write capability
Lake Formation behavior (credential vending, column/row filtering)
Snapshot freshness after external writes
Latest-record view compatibility
Case-sensitivity and lowercase naming requirements

Key finding from validation (2026-06-08): AWS Glue Iceberg REST supports SigV4-authenticated catalog access. Lake Formation credential vending works through a proprietary mechanism (GetTemporaryGlueTableCredentials). Snowflake requires explicit ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS — the default mode fails. Engines that can sign requests with their own IAM credentials (EMR Spark ✅ verified, Trino on EMR expected, PyIceberg ✅ verified) work out of the box. Snowflake also works when configured correctly (✅ verified 2026-06-05). EMR requirement: 7.13.0+ (7.5.0 has a credential resolution bug). Governance note: Lake Formation column-level filtering is NOT enforced via the VENDED_CREDENTIALS path for Snowflake.

Trino note: AWS has published guidance on querying S3 Tables from Trino using the Iceberg REST endpoint. Trino's Iceberg connector supports REST catalogs natively, making it one of the most straightforward third-party validation targets.

EMR Spark note: For large-scale backfill or re-enrichment (100K+ files), Spark on EMR Serverless or EMR on EC2 can be used as an alternative to Lambda/Fargate. Use Glue Iceberg REST for centralized metadata access with Lake Formation governance. Verified (2026-06-02): EMR Serverless Spark 7.13.0 successfully reads S3 Tables metadata via Glue Iceberg REST — SHOW NAMESPACES, SHOW TABLES, SELECT, COUNT, and snapshot history all work. Requires EMR 7.13.0+ (7.5.0 has a credential resolution bug for S3 Tables warehouse format).

Redshift note: Validate separately from Athena — external schema setup, Glue statistics, Lake Formation permissions, and query latency against latest-record views may differ.

For the full compatibility matrix, see lakehouse-tools/tool-compatibility-matrix.yaml.

Catalog Authority Rule

For each Iceberg table, define exactly one authoritative catalog for metadata pointer and commit coordination. Do not operate S3 Tables, Polaris, Gravitino, Nessie, and Glue as independent writable catalogs for the same table unless the integration explicitly supports federation without dual writes.

                    ┌─────────────────────┐
                    │ Authoritative Catalog│
                    │ (ONE per table)      │
                    │ • S3 Tables + Glue   │
                    │   (this PoC)         │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
         Read-only        Read-only        Read-only
         consumers        consumers        consumers
         (Trino,          (Databricks,     (Snowflake,
          Dremio,          UC Foreign       Cortex,
          StarRocks)       Catalog)         Open Catalog)

Split-brain warning: If two catalogs independently write to the same Iceberg table, snapshot pointers can diverge, causing data loss or corruption. Federation (one writer, many readers) is safe. Dual-write is not.

The Bigger Picture

                    S3 Tables (Iceberg)
                           │
              ┌────────────┼────────────┐
              │            │            │
              ▼            ▼            ▼
         Athena ✅    Databricks ❌  Snowflake ✅
         EMR Spark ✅  (UC Foreign    (Glue REST +
         PyIceberg ✅   path still    VENDED_CREDENTIALS
                      blocked in     verified 2026-06-05)
                      tested config)

Databricks integration summary (confirmed 2026-06-01):

Direct S3 AP access: ❌ (UC session policy)
NFS mount → UC Volume: ❌ (cloud URI only)
Delta Sharing via S3 AP: ❌ (same credentials)
DataSync → S3 → UC: ✅ (supported workaround, not zero-copy for synced data)
UC Foreign Catalog / Foreign Iceberg via Glue: ❌ Retested 2026-06-09; Glue Connection and credentials succeeded, but UC External Location validation failed against S3 Tables internal storage. Support case submitted.

For capability-level details such as read, write, time travel, metadata tables, and governance behavior, see verification-evidence/cross-platform-compatibility.yaml.

This is a temporary gap. S3 Tables is relatively new (GA Dec 2024), and cross-platform federation is actively being developed. Feature requests have been filed with both platforms. Timeline for native S3 Tables support is unknown, but the Iceberg ecosystem is converging rapidly — Unity Catalog 2.0's native Iceberg support and Snowflake's Open Catalog (Polaris) both point toward broader interoperability.

Catalog Decision Guide

In the Iceberg world, the catalog is the system of record for table metadata pointers and atomic operations. Choose based on your primary platform:

Primary platform	Recommended catalog	Notes
AWS-first / Athena-first	S3 Tables + Glue/Lake Formation	Used in this PoC
Databricks-first	Unity Catalog Managed/Foreign Iceberg	Best for UC governance, lineage, discovery
Snowflake-first	Snowflake Open Catalog (Polaris)	Best for Snowflake-governed Iceberg interoperability; validate external engine governance behavior
Neutral / OSS-first	Apache Polaris or other REST catalog	Maximum portability

Dual catalog warning: Avoid running two authoritative catalogs for the same Iceberg table. Use Snowflake Open Catalog / Polaris when Snowflake or a neutral REST catalog should be authoritative. Use S3 Tables when AWS-native Athena / Lake Formation / Glue governance is authoritative. If both platforms need access, use federation (one authoritative catalog, others read via REST).

When to Consider Snowflake Open Catalog / Polaris

Use S3 Tables + Glue/Lake Formation when AWS-native governance is authoritative (this PoC).

Consider Snowflake Open Catalog / Polaris when:

Snowflake should be the primary governance and interoperability plane
Multiple engines need Iceberg REST access through a neutral catalog
Snowflake-managed Iceberg or Snowflake-first AI/Data Cloud workflows are the center of gravity
You want managed Polaris instead of operating your own REST catalog

This would be a different authoritative-catalog design from the current PoC and should not be mixed as a second writer for the same table.

Databricks-first note: For organizations standardizing on Databricks, consider whether the metadata catalog itself should be managed in Unity Catalog as Managed Iceberg or Delta + UniForm, then exposed to AWS engines through Glue federation to UC or the UC Iceberg REST endpoint. Use S3 Tables when AWS-native Athena/Lake Formation is the primary governance path. The choice depends on which governance plane (UC or Lake Formation) is authoritative for your organization.

Format Decision for Databricks Environments

Option	Best for	Tradeoff
S3 Tables Iceberg	AWS-first Athena/LF governance	UC integration pending (Foreign Catalog validation)
UC Managed / Foreign Iceberg	Databricks-first open format governance	Validate current feature availability, region support, and limitations
Delta + UniForm	Databricks-native pipelines + Iceberg read compatibility	Iceberg metadata generated asynchronously; non-Databricks writes constrained
Metadata sync to Delta	BI activation in Databricks SQL	Metadata copy, but raw files remain zero-copy

Summary: What We Built

Layer	Technology	Status
Storage	FSx for ONTAP (files) + S3 Tables (metadata)	✅ Verified
AI	Bedrock Claude Vision + Titan Embeddings V2	✅ Verified
Search	OpenSearch Serverless NextGen (scale-to-zero)	✅ Verified
Governance	Lake Formation (table-level) + CloudTrail	✅ Verified
PII	Comprehend (EN) + Bedrock Claude (JA)	✅ Verified
Cross-platform	Athena ✅, EMR Spark ✅, PyIceberg ✅, Snowflake ✅, Databricks ⚠️	Mostly verified

The Numbers

42 seconds: Full demo execution time
$0.07: Total demo cost
Near $0 idle compute/search cost: Persistent metadata, logs, and audit trails may still incur small charges
$114/month: Projected cost at 100K files, 1000 changes/day
95%: Storage cost reduction vs S3 full copy
0.95: AI classification confidence (invoice detection)
7/7: PII entities detected and redacted

For regulated workloads, align Iceberg snapshot retention with deletion SLAs and audit evidence retention.

What's Next for This Project

Monitor support cases: Databricks UC Foreign Catalog for S3 Tables — timeline unknown
Production hardening: SQS batching, DLQ alerting, reconciliation jobs
Multi-language PII: Extend beyond EN/JA to other languages
Cost optimization: Provisioned Throughput for high-volume Bedrock usage
Production semantics: File identity, latest-record views, index reconciliation, and snapshot retention alignment
ONTAP production hardening: S3 Access Point identity matrix, FPolicy event filtering, SnapMirror catalog rebinding, and FSx performance dashboard
Snowflake governance: Implement Horizon Row Access Policies and Dynamic Data Masking for column-level protection (since Lake Formation column-level is not enforced via VENDED_CREDENTIALS)

Get Involved

⭐ Star the repo if this was useful
🐛 Open an Issue for questions or suggestions
🍴 Fork and adapt for your own unstructured data catalog

This concludes the 3-part series. All code is at github.com/Yoshiki0705/fsxn-lakehouse-integrations. Questions? Open a GitHub Issue.

Governance disclaimer: This article provides governance guidance and architectural patterns. It does not substitute for legal or compliance judgment. Final regulatory determinations should be confirmed with legal and compliance teams.