TL;DR
We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Davis AI can automatically correlate file access anomalies with application performance degradation — answering "why is the app slow?" with "because 500 users hit the same NFS share simultaneously."
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → Dynatrace Log Ingest API v2
│
▼
Davis AI
┌───────────────────┐
│ Correlates: │
│ • File access │
│ anomalies │
│ • APM metrics │
│ • Infrastructure │
│ health │
│ │
│ → Root cause │
│ in seconds │
└───────────────────┘
Verified on Dynatrace SaaS Trial (Tokyo-equivalent region). Logs visible in Logs Viewer within 1-2 minutes.
This is Part 11 of the Serverless Observability for FSx for ONTAP series.
Why Dynatrace for FSx for ONTAP?
Most observability tools treat storage logs as isolated data. Dynatrace is different — it builds a topology map of your entire stack and uses Davis AI to find causal relationships through time-window correlation and entity connectivity:
| Scenario | Without Dynatrace | With Dynatrace |
|---|---|---|
| App latency spike | "Check the logs" | Davis AI detects temporal correlation: file access to /vol/data/ increased 10x within the same 5-minute window as app response time degradation, connected via topology (app → NFS mount → SVM) |
| Storage I/O anomaly | Manual investigation | Automatic correlation via shared topology entities — Davis identifies which services are affected based on entity relationships |
| User reports slow file access | Grep through audit logs | DQL query + topology view showing the full dependency path from user request to storage operation |
The key differentiator: Davis AI correlates events across entities that share topology connections within overlapping time windows — not just keyword matching or manual dashboard correlation.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Event Sources │
├─────────────────────────────────────────────────────────┤
│ │
│ EventBridge Scheduler │
│ rate(5 minutes) ──→ Lambda │
│ │ lists new files via │
│ │ S3 Access Point │
│ │ (checkpoint in SSM) │
│ ▼ │
│ Dynatrace Log Ingest API v2 │
│ (Api-Token auth) │
│ │ │
│ EMS Webhook │ │
│ ──→ API GW ──→ Lambda ─────────────┤ │
│ (ems_handler) │ │
│ ▼ │
│ FPolicy Dynatrace │
│ ──→ ECS Fargate ──→ SQS (Logs Viewer, │
│ ──→ Bridge Lambda Davis AI, │
│ ──→ EventBridge DQL, │
│ ──→ Lambda (fpolicy_handler) Dashboards) │
│ ──────────────────────────────────────────────────────┤│
└─────────────────────────────────────────────────────────┘
Davis AI: The Correlation Engine
When you ship FSx for ONTAP logs to Dynatrace alongside your APM data, Davis AI can detect patterns like:
- Storage contention → App slowdown: Spike in file operations correlates with increased response times
- Ransomware activity → Service impact: ARP (Anti-Ransomware Protection) EMS events correlate with unusual file encryption patterns
- Quota exhaustion → Write failures: ONTAP quota warnings correlate with application write errors
This works because Dynatrace maps your FSx for ONTAP SVM as a custom device entity in its topology, connecting it to the applications that access it.
Quick Start (30 Minutes)
1. Create Dynatrace API Token
- Log in to your Dynatrace environment
- Go to Access Tokens (Settings → Integration → Access tokens)
- Create a token with scope:
logs.ingest - Token format:
dt0c01.<TOKEN_ID>.<TOKEN_SECRET>
2. Store Credentials
aws secretsmanager create-secret \
--name "dynatrace/fsxn-api-token" \
--secret-string '{"api_token":"dt0c01.XXXXXXXX.YYYYYYYY"}' \
--region ap-northeast-1
3. Deploy CloudFormation Stack
aws cloudformation deploy \
--template-file integrations/dynatrace/template.yaml \
--stack-name fsxn-dynatrace-integration \
--parameter-overrides \
S3AccessPointArn=arn:aws:s3:ap-northeast-1:123456789012:accesspoint/fsxn-audit-ap \
DynatraceApiTokenSecretArn=arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:dynatrace/fsxn-api-token-XXXXXX \
DynatraceEnvUrl=https://abc12345.live.dynatrace.com \
S3BucketName=my-fsxn-audit-bucket \
--capabilities CAPABILITY_NAMED_IAM \
--region ap-northeast-1
4. Verify in Dynatrace
Navigate to Logs → View logs → Run query:
fetch logs
| filter log.source == "fsxn-ontap"
Logs should appear within 1-2 minutes.
Log Entry Format
Each audit log event is shipped with structured attributes for DQL querying:
{
"content": "{\"EventID\":\"4663\",\"UserName\":\"admin@corp.local\",...}",
"log.source": "fsxn-ontap",
"dt.source_entity": "CUSTOM_DEVICE-fsxn-svm-prod-01",
"timestamp": "2026-01-15T12:00:00Z",
"severity": "info",
"fsxn.svm": "svm-prod-01",
"fsxn.operation": "ReadData",
"fsxn.user": "admin@corp.local",
"fsxn.path": "/vol/data/file.txt",
"fsxn.s3_key": "audit/2026/01/15/audit-001.json"
}
The dt.source_entity field links logs to a custom device in Dynatrace's topology, enabling Davis AI correlation.
DQL Query Examples
Dynatrace Query Language (DQL) provides powerful analytics:
Basic Investigation
// All failed file access attempts (using structured attributes)
fetch logs
| filter log.source == "fsxn-ontap"
| filter fsxn.result == "Failure"
| summarize count(), by: {fsxn.user, fsxn.path}
// Top operations by volume
fetch logs
| filter log.source == "fsxn-ontap"
| summarize count(), by: {fsxn.operation}
| sort count() desc
// Access timeline for a specific SVM
fetch logs
| filter fsxn.svm == "svm-prod-01"
| makeTimeseries count(), interval: 5m
APM Correlation Queries
// File access volume vs app response time (side-by-side)
fetch logs
| filter log.source == "fsxn-ontap"
| makeTimeseries file_ops = count(), interval: 5m
// Correlate with service metrics in a dashboard
// (Place this next to a service response time tile)
// Find users causing the most I/O during a performance incident
fetch logs
| filter log.source == "fsxn-ontap"
| filter timestamp >= now() - 1h
| summarize ops = count(), by: {fsxn.user}
| sort ops desc
| limit 10
Security Queries
// Detect potential ransomware (mass file modifications)
fetch logs
| filter log.source == "fsxn-ontap"
| filter fsxn.operation == "WriteData" OR fsxn.operation == "Delete"
| makeTimeseries write_ops = count(), interval: 1m
| filter write_ops > 100
// After-hours access
fetch logs
| filter log.source == "fsxn-ontap"
| filter hour(timestamp) < 7 OR hour(timestamp) > 19
| summarize count(), by: {fsxn.user, fsxn.path}
Deployment Options
| Deployment | URL Format | Data Location |
|---|---|---|
| SaaS | https://<env-id>.live.dynatrace.com |
Dynatrace-managed (region-specific) |
| Managed | https://<your-domain>/e/<env-id> |
Your infrastructure |
| ActiveGate | https://<host>:9999/e/<env-id> |
Your network (proxy) |
For data sovereignty requirements, Dynatrace Managed or ActiveGate keeps all data within your infrastructure.
Cost Analysis
Dynatrace pricing is based on Davis Data Units (DDU):
| Monthly Log Volume | DDU/day (est.) | Monthly DDU Cost |
|---|---|---|
| 1 GB | ~1 DDU | Minimal (within base allocation) |
| 10 GB | ~10 DDU | ~$25/month (at $2.50/DDU) |
| 100 GB | ~100 DDU | ~$250/month |
| Component | Monthly Cost (10 GB/month) |
|---|---|
| Lambda (5-min polling) | ~$3 |
| EventBridge Scheduler | ~$1 |
| Secrets Manager | ~$1 |
| Dynatrace DDU | ~$25 |
| Total | ~$30 |
DDU pricing varies by contract. The 14-day trial includes generous DDU allocation for validation. Check your license terms for production estimates.
Gotchas & Lessons Learned
| # | Discovery | Impact |
|---|---|---|
| 1 | API returns HTTP 204 on success (not 200) | Lambda must treat 204 as success |
| 2 | Trial environment has 1-2 minute ingestion lag | Wait before checking Logs Viewer |
| 3 |
logs.ingest scope is required — ReadConfig/WriteConfig won't work |
Token creation must select correct scope |
| 4 |
logs.read scope needed separately for API-based queries |
Create a second token for automation |
| 5 | Log entries older than 24 hours may be rejected | Use current timestamps in test data |
| 6 | Max 1MB per request (smallest batch limit in this series) | Lambda splits large batches |
| 7 | Firehose delivery requires ActiveGate (not direct to SaaS) | Use Lambda direct for simplicity |
Davis AI Integration Pattern
To get the most from Davis AI correlation, all three prerequisites must be in place:
-
Ship FSx for ONTAP logs (this integration) — with
dt.source_entityfield set - Deploy OneAgent on application hosts that access FSx for ONTAP via NFS/SMB — this creates the application-side topology
-
Create custom device for each SVM (
dt.source_entity) — this creates the storage-side topology node. Use the Entity API (POST /api/v2/entities/custom) or Settings API to pre-create the device entity before first log ingestion
Prerequisites for correlation: Davis AI correlation only activates when all three components are connected in the topology. Without OneAgent on the application hosts, Davis AI cannot establish the causal link between file access patterns and application performance. The custom device entity must use a consistent naming convention (e.g.,
CUSTOM_DEVICE-fsxn-{svm-name}) across all log entries.
Application (OneAgent) ──→ NFS/SMB ──→ FSx for ONTAP (SVM)
│ │
│ APM metrics │ Audit logs
▼ ▼
Dynatrace Davis AI
(automatic correlation)
Production Readiness
This integration follows the project's Production Readiness Levels:
| Level | What You Get | Go/No-Go to Next |
|---|---|---|
| Level 1 (this Quick Start) | Audit poller + DLQ | Logs arrive, checkpoint advances, DLQ empty 24h |
| Level 2 | + DQL dashboards + alerts | SLOs met 7 days, security review done |
| Level 3 | + DynamoDB ledger + Davis AI correlation | SLOs met 30 days, compliance pack |
| Level 4 | + OTel Collector + redaction + OneAgent | Multi-backend, PII redaction, full topology |
Data classification: Dynatrace receives
fsxn.userandfsxn.pathfields (PII/sensitive). Dynatrace SaaS environments are region-specific — select a region matching your data residency requirements. For Managed/ActiveGate deployments, data stays in your infrastructure. See Data Classification Guide.
Full criteria: Pipeline SLO Definitions | DLQ Replay Runbook
CloudFormation Templates
| Template | Purpose | Key Parameters |
|---|---|---|
template.yaml |
FSx audit log poller | S3AccessPointArn, DynatraceApiTokenSecretArn, DynatraceEnvUrl |
template-ems.yaml |
EMS webhook handler | DynatraceApiTokenSecretArn, DynatraceEnvUrl |
template-fpolicy.yaml |
FPolicy EventBridge handler | DynatraceApiTokenSecretArn, DynatraceEnvUrl, EventBusName |
Resources
- GitHub: integrations/dynatrace/
- Dynatrace Log Ingest API: API v2 Documentation
- Davis AI: Davis AI Overview
- DQL Reference: Dynatrace Query Language
- Series GitHub: github.com/Yoshiki0705/fsxn-observability-integrations
Series Navigation
- Part 1: Why Your FSx for ONTAP Logs Deserve Better
- Part 2: Shipping FSx for ONTAP Logs to Datadog — The Serverless Way
- Part 3: Event-Driven Ransomware Detection with ONTAP ARP + Datadog
- Part 4: FPolicy File Activity Pipeline — ONTAP to Datadog via ECS Fargate
- Part 5: Escape Vendor Lock-in with OTel Collector
- Part 6: Direct-to-Grafana: Shipping Logs via OTLP Gateway
- Part 7: New Relic: 100GB Free Tier for FSx Audit Logs
- Part 8: EC2 to Serverless: Modernizing Splunk Integration
- Part 9: Data Sovereignty with Elastic
- Part 10: High-Cardinality Analysis with Honeycomb
- Part 11: AI-Powered Root Cause with Dynatrace (this post)
Questions about the Dynatrace integration or Davis AI correlation? Drop a comment below.
GitHub: github.com/Yoshiki0705/fsxn-observability-integrations
Top comments (0)