Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on Jun 7

28 Industry Reference Patterns with FSx for ONTAP S3 Access Points — Phase 15

#aws #amazonfsxfornetappontap #s3accesspoints #serverless

TL;DR

Phase 15 expands the pattern library from 17 to 28 industry-specific use cases, providing reference implementations across major AWS Industry verticals where FSx for ONTAP file processing is relevant. Each new pattern includes a CloudFormation template, Step Functions workflow, Python Lambda functions, 8-language documentation, and property-based tests. Combined with 6 FlexCache/FlexClone patterns and 1 SAP/ERP pattern, the repository now offers 35 deployable reference patterns for enterprise file processing on FSx for ONTAP.

The SAP/ERP pattern focuses on controlled document/report processing around ERP-adjacent file exports (IDoc, spool), not direct transactional SAP data manipulation.

Important: These are reference implementations with production-readiness guidance, not fully certified production systems. Customers must validate against their own regulatory, security, and operational requirements before production use.

For S3 standard bucket users: This library is not a replacement for S3 data lake patterns. It is a file-data integration pattern for customers who want to process FSx ONTAP-resident data through S3-compatible APIs while preserving NAS access paths. See docs/s3-bucket-user-guide.md for a detailed comparison.

Serverless boundary: Compute (Lambda), orchestration (Step Functions), eventing (EventBridge), and AI services (Bedrock, Textract, Rekognition) are serverless/managed. FSx for ONTAP is a fully managed file system with provisioned capacity and operational considerations — it is not scale-to-zero storage. This is a serverless processing pattern over existing enterprise file data, not a pure serverless storage pattern.

When NOT to use this: If your workload is already object-native, does not require NFS/SMB coexistence, and can use standard S3 data lake patterns — prefer S3-native serverless architecture.

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns

Why 28 Use Cases?

AWS organizes customers into 22 industry verticals. When we mapped our existing 17 patterns against these verticals, several gaps stood out:

Telecommunications — No CDR/network log processing pattern
Advertising & Marketing — No creative asset management
Travel & Hospitality — No document processing for reservations
Agriculture & Food — No traceability or crop monitoring
Sustainability/ESG — No ESG metrics extraction
Nonprofit — No grant management automation
Utilities — No drone/SCADA-based asset inspection
Real Estate — No portfolio analysis
HR — No resume screening (with PII protection)
Chemicals — No SDS/lab notebook processing
Transportation (railway) — No deterioration detection

Phase 15 fills all of these, covering 19 of 22 AWS Industry verticals (remaining 3 — Consumer Packaged Goods, Mining, Software/Internet — have limited file-processing relevance for this pattern type). Combined with 11 Japan-market focus areas (all covered), the repository addresses the vast majority of enterprise file processing scenarios.

The 11 New Patterns

P0: Foundation Patterns

UC	Industry	Key AWS Services	Differentiator
UC18	Telecom	Athena, Bedrock	CDR/syslog anomaly detection with 7-day baseline
UC19	AdTech	Rekognition, Textract, Bedrock	Brand compliance scoring + moderation

P1: Document Intelligence

UC	Industry	Key AWS Services	Differentiator
UC20	Travel	Textract, Comprehend, Rekognition	Multilingual reservation extraction + facility inspection
UC21	Agriculture	Rekognition, Textract, Bedrock	GeoTIFF crop analysis + lot traceability
UC22	Transportation	Rekognition, Textract, Bedrock	Safety-critical escalation trigger + deterioration trends

P2: Specialized Processing

UC	Industry	Key AWS Services	Differentiator
UC23	Sustainability	Textract, Bedrock	ESG metric extraction + GRI/TCFD/ISSB mapping
UC24	Nonprofit	Textract, Comprehend, Bedrock	Grant application + outcome matching
UC25	Utilities	Rekognition, Bedrock, Athena	Drone + SCADA + thermal tri-modal inspection
UC26	Real Estate	Rekognition, Textract, Bedrock	Property analysis + lease extraction + PII flagging
UC27	HR	Textract, Comprehend, Bedrock	Recruiting document triage with PII protection
UC28	Chemicals	Textract, Rekognition, Bedrock	SDS hazard extraction + GHS compliance + lab notebook

Architecture: One Pattern, Many Industries

Architecture Classification

Layer	Classification
Workflow orchestration	Serverless (Step Functions)
Compute	Serverless (Lambda)
Eventing / scheduling	Serverless (EventBridge)
AI/ML services	Managed service consumption (Bedrock, Textract, Rekognition, Comprehend)
File storage	Managed/provisioned (FSx for ONTAP)
Operations model	Hybrid: serverless processing + managed file storage

Lambda concurrency must be bounded by FSx ONTAP S3 AP throughput behavior. Do not treat Lambda concurrency as the only scaling control.

Common Workflow Pattern

Every pattern follows the same proven architecture:

EventBridge Scheduler
       │
       ▼
Step Functions State Machine
       │
       ├── Discovery Lambda (VPC-internal, ONTAP API)
       │        │
       │        ▼
       │   S3 Access Point (list + classify files)
       │
       ├── Processing Map (parallel, Retry + Catch)
       │        │
       │        ▼
       │   [Rekognition | Textract | Comprehend | Bedrock | Athena]
       │
       └── Report Lambda
                │
                ├── Output → S3 AP (FSx ONTAP volume)
                └── SNS Notification

What changes per industry:

File prefixes and extensions (Discovery Lambda configuration)
AI/ML service selection (Rekognition for images, Textract for documents, Bedrock for reasoning)
Domain-specific schemas (ESG metrics, GHS sections, CDR fields)
Review thresholds (60% escalation trigger for safety-critical defects, 80% standard detection, 90% auto-approve threshold)
Compliance requirements (PII filtering for HR, data classification labels, audit trails)

For production deployments, validate how S3 AP-generated output files appear from existing NFS/SMB clients, including ownership, permissions, naming convention, and Snapshot/SnapMirror policy impact. See ONTAP Integration Notes.

Shared Modules: The Productivity Multiplier

The 11 new patterns reuse the same shared/ modules that power the original 17:

Module	Purpose	Used By
`s3ap_helper.py`	S3 Access Point abstraction (alias + ARN)	All 28 UCs
`exceptions.py`	Domain exceptions + error handler decorator	All 28 UCs
`observability.py`	EMF metrics + structured logging	All 28 UCs
`human_review.py`	Confidence-based review decisions	UC22, UC25, UC27
`data_classification.py`	Output data labeling (INTERNAL/CUI/etc.)	UC23, UC24, UC27, UC28
`schemas/events.py`	TypedDict event/response schemas	All 28 UCs

Adding a new industry pattern takes 2-3 hours (not days) because the infrastructure is already solved. A new pattern is considered field-shareable only after DemoMode execution, cfn-lint validation, unit/property tests, success metrics, data classification, and human review thresholds are documented.

Key Design Decisions for New Patterns

1. Safety-Critical Thresholds (UC22)

Railway infrastructure inspection cannot accept false negatives. We use a dual-threshold approach:

STANDARD_THRESHOLD = 80       # General defect detection trigger
SAFETY_CRITICAL_THRESHOLD = 60  # Bridges, signaling, rail joints — lower to reduce false negatives
HUMAN_REVIEW_THRESHOLD = 90    # Auto-approve only above this

Critical design intent: 60% is NOT an auto-approval threshold. It is an escalation trigger — any signal above 60% for safety-critical categories triggers mandatory human review. The system is designed to surface potential defects for expert evaluation, not to automate safety decisions. All detections below 90% confidence require human review regardless of category.

2. PII-First Design (UC27)

Recruiting document triage handles personal data. The pattern enforces:

No PII in logs — structured logging strips personal identifiers
Protected characteristic exclusion — Bedrock prompt explicitly excludes age, gender, ethnicity
Encrypted output — all results written with data classification labels
Audit trail — every scoring decision is logged with justification (not content)

Regulatory notice: UC27 is a document triage and summarization workflow, not an automated hiring decision system. Final hiring decisions must remain with qualified human reviewers. Customers must validate against local labor law, privacy regulations (GDPR, APPI, CCPA), and anti-discrimination requirements before any use in recruitment processes. Output must not include ranking by protected attributes, and explanation fields must cite only job-relevant qualifications.

3. Tri-Modal Inspection (UC25)

Utilities asset inspection combines three data modalities in a single workflow:

Visual (drone images) → Rekognition defect detection
Temporal (SCADA logs) → Athena time-series anomaly detection
Thermal (FLIR images) → Hot-spot classification (≥10°C differential)

The Step Functions workflow processes all three in parallel Map states, then merges results for a unified maintenance priority report.

4. ESG Framework Mapping (UC23)

Sustainability reporting requires mapping extracted metrics to multiple frameworks simultaneously:

GRI (Global Reporting Initiative)
TCFD (Task Force on Climate-related Financial Disclosures)
ISSB (International Sustainability Standards Board)

Bedrock performs the mapping using structured prompts with framework-specific indicator definitions.

Testing: 1,499+ Tests Across 28 Patterns

Each new pattern includes:

Unit tests with moto for AWS service mocking
Property-based tests (Hypothesis) for invariant verification
cfn-lint validation for all CloudFormation templates
ruff linting for Python code quality

Notable property tests:

UC22: severity_level ∈ {critical, major, minor, observation} for all inputs
UC25: SCADA thresholds within physical bounds (voltage ±5%, frequency ±0.5 Hz)
UC27: No protected characteristics appear in any output field
UC28: All GHS mandatory sections validated for completeness

Responsible AI and Human Review

These patterns are reference workflows, not fully automated decision systems. For regulated or safety-critical domains (healthcare, finance, transportation, HR, public sector), customers must define:

Human review thresholds — what confidence level requires expert validation
Appeal/escalation process — how incorrect classifications are corrected
Audit trail requirements — what decisions need immutable logging
Data retention policy — how long intermediate results are kept
Model evaluation criteria — accuracy, hallucination rate, bias testing on domain data
Local regulatory review — jurisdiction-specific compliance (FISC, HIPAA, GDPR, NARA, labor law)

The shared/human_review.py module provides a framework for confidence-based routing, but threshold values and escalation procedures must be defined by domain experts, not by template defaults.

Customers are responsible for validating these workflows against their own policies, risk classification, and regulatory obligations before production use.

Pattern Selection Guide

Customer Situation	Recommended Starting Pattern
FSx ONTAP already used for shared files	UC by industry + DemoMode=false
No FSx ONTAP yet, wants to evaluate workflow	Any UC + DemoMode=true
Document-heavy workload (PDF, contracts, reports)	UC20 / UC23 / UC24 / UC26 / UC27 / UC28
Image-heavy inspection workload	UC19 / UC21 / UC22 / UC25
Logs / time-series / analytics workload	UC18 / UC25 (SCADA)
Safety-critical review required	UC22 / UC25 with human_review module
PII-sensitive workflow	UC27 / UC26 with data_classification module
ESG / sustainability reporting	UC23 with framework mapping
Greenfield object-native workload (no NAS)	Prefer standard S3 + serverless-native architecture

DemoMode to Production Path

Area	DemoMode (evaluation)	Production (FSx ONTAP)
Input source	Regular S3 bucket	FSx ONTAP S3 Access Point
Permissions	S3 IAM only	IAM + S3 AP policy + ONTAP file identity
Network	Public AWS service path	Internet-origin or VPC-origin design decision (NetworkOrigin is immutable after creation)
Data	Sample/synthetic data	Customer-controlled NAS data
Governance	Demo labels only	Data classification + lineage + retention
Cost	~$0.10/execution	+ FSx ONTAP infrastructure (~$194/month base)
Code compatibility	Standard S3 bucket semantics	Validate the FSx ONTAP S3 AP API subset and unsupported S3 bucket features before production
Access point lifecycle	N/A	NetworkOrigin changes require creating a new S3 AP

Cost varies by region, deployment type, SSD capacity, throughput capacity, backups, and data transfer; the figure above is a baseline estimate for Single-AZ / 128 MBps / 1 TB SSD. This cost model is not scale-to-zero storage. Use this pattern when the value of processing existing NAS data in place outweighs the baseline FSx ONTAP infrastructure cost.

Deployment: 30 Minutes to First Result

Every pattern includes a samconfig.toml.example and step-by-step deployment:

# 1. Copy and configure
cp samconfig.toml.example samconfig.toml
# Edit: S3AccessPointAlias, VpcId, SubnetIds, etc.

# 2. Deploy
sam build && sam deploy --guided

# 3. Execute
aws stepfunctions start-execution \
  --state-machine-arn <ARN from outputs>

# 4. Verify
aws stepfunctions describe-execution --execution-arn <ARN>
# Status: SUCCEEDED

For patterns without FSx for ONTAP, DemoMode=true uses a regular S3 bucket — ideal for evaluation without infrastructure commitment.

Benchmark Insight: Small Files Don't Need More Throughput

During Phase 15 deployment verification, we ran benchmarks at 128/256/512 MBps throughput capacity with a 202-byte JSON manifest:

Throughput	P50 @ conc=1	P50 @ conc=25	P50 @ conc=50
256 MBps	56.9 ms	60.3 ms	257.9 ms
512 MBps	59.8 ms	59.9 ms	246.1 ms

Conclusion: For metadata-heavy workloads (JSON manifests, small config files, document headers), throughput capacity increase has zero effect on latency. The bottleneck is connection overhead (TLS + S3 AP routing), not bandwidth. Save costs by staying at 128 MBps for these workloads.

Sizing reference from a specific test environment, not a service limit.

Documentation: 8 Languages × 28 Patterns

Every pattern includes documentation in:
🇯🇵 Japanese (primary) · 🇺🇸 English · 🇰🇷 Korean · 🇨🇳 Chinese (Simplified) · 🇹🇼 Chinese (Traditional) · 🇫🇷 French · 🇩🇪 German · 🇪🇸 Spanish

Each language includes:

README.md — Overview, deployment, success metrics
docs/architecture.md — Mermaid data flow diagram
docs/demo-guide.md — Step-by-step demo with verification checklist

Each UC README includes Success Metrics with Business Outcome, Technical KPI, Quality KPI, Cost KPI, and Go/No-Go criteria. This article summarizes the portfolio; detailed success criteria live with each pattern.

What Changed Since Phase 14

Metric	Phase 14	Phase 15	Delta
Use cases	17	28	+11
Total patterns	24	35	+11
Test count	~800	1,499+	+699
Industries covered	14/22	19/22	+5
Languages	8	8	—
Shared modules	8	11	+3
Documentation files	~400	~700	+300

Who Should Use Each New Pattern?

Recommended Starting Patterns

Start here if...	Pattern	Why
You want document intelligence	UC20 or UC26	Multilingual extraction + property/lease analysis
You want log analytics	UC18	CDR/syslog anomaly detection with baseline
You need PII-safe document triage	UC27	Protected characteristic exclusion built-in
You need inspection workflows	UC22 or UC25	Safety-critical escalation + tri-modal
You want ESG extraction	UC23	Multi-framework mapping (GRI/TCFD/ISSB)

Full Pattern List

If you are...	Start with...	Why
Telecom operator with CDR data	UC18	Anomaly detection across network logs
Ad agency managing creative assets	UC19	Automated brand compliance scoring
Hotel chain with inspection photos	UC20	Facility condition monitoring at scale
Agricultural cooperative	UC21	Crop health + traceability in one workflow
Railway/transit operator	UC22	Safety-critical deterioration detection
ESG reporting team	UC23	Multi-framework metric extraction
Grant-making foundation	UC24	Application processing + outcome matching
Power utility with drone programs	UC25	Tri-modal inspection (visual + SCADA + thermal)
Real estate portfolio manager	UC26	Property analysis + lease extraction
Recruiting team (APAC/EMEA)	UC27	PII-compliant recruiting document triage
Chemical manufacturer	UC28	SDS compliance + lab notebook digitization

What's Next

VPC-internal Lambda benchmark — True VPC path performance (eliminates Internet latency)
FPolicy TCP-level Replay Storm — Real ONTAP event replay (requires ECS rebuild)
Cross-repository integration — Link patterns to fsxn-lakehouse-integrations for analytics pipelines
Glue Data Catalog integration — Schema versioning and data quality checks for output datasets
Community contributions — Pattern template for community-submitted industry use cases

Resolved from Phase 14: FlexCache × S3 AP integration confirmed as not currently supported by AWS — tracked in Field Feedback Log. FC1 Recovery Metrics depend on this feature. Both remain pending AWS feature availability.

Ownership Model

Layer	Recommended Owner
Shared modules (`shared/`)	Platform / DevOps team
UC business logic (`functions/`)	Application / data team
FSx ONTAP and S3 AP infrastructure	Storage / platform team
IAM, data classification, encryption	Security team
Success metrics and Go/No-Go	Business owner
Regulatory compliance mapping	GRC / legal team

Compliance Positioning

These templates do not certify compliance with any specific regulation. They provide implementation hooks for audit logging, retention, classification, and human review that customers can map to their regulatory controls. Each organization must independently validate compliance with applicable regulations (FISC, HIPAA, GDPR, NARA, local labor law, etc.).

NetApp / ONTAP Operational Notes

For production deployments on FSx for ONTAP, review the ONTAP-specific guidance in docs/ontap-integration-notes.md, including:

SVM / volume / protocol scope assumptions
NFS/SMB visibility of S3 AP-generated outputs (file ownership = AP file system identity)
IAM + S3 AP policy + ONTAP file identity behavior, separate from NFS export policy evaluation
Snapshot / SnapMirror / retention impact on output artifacts
Scheduler vs FPolicy trigger mode selection
FlexCache / FlexClone combination patterns per UC
NetApp support diagnostic bundle
OT/manufacturing safety caveat

FlexCache/FlexClone note: UC × FC combination patterns describe adjacent architecture patterns. Validate current AWS/FSx feature support before assuming direct S3 AP access to cached or cloned paths.

Benchmark scope: Results are from Single-AZ, First-generation FSx ONTAP. Validate separately for Multi-AZ or newer generation file systems.

Regulated research workflows (UC7, UC28, FC5): Capture input dataset version, model/prompt version, reviewer action, and output checksum as lineage metadata. See shared/lineage.py v2 fields.

Stats

New patterns: 11 (UC18-UC28)
New Lambda functions: 44 (4 per pattern average)
New tests: 699
New documentation files: ~300 (across 8 languages)
New shared modules: data_classification.py, human_review.py, schemas/events.py
Deployment verified: All 28 UCs achieved SUCCEEDED status in ap-northeast-1
Benchmark runs: 2 additional (256/512 MBps small-file comparison)
Cost: ~$10 total for deployment verification (Lambda + Step Functions + Bedrock Nova Lite)

Try It Today

git clone https://github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns.git
cd FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns

# Quick test (no AWS account needed)
make test-quick

# Deploy any pattern with DemoMode (no FSx ONTAP needed)
cd telecom-network-analytics
cp samconfig.toml.example samconfig.toml
sam build && sam deploy --guided