TL;DR
We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data into actionable insights. Two delivery paths verified:
[Path A: Direct]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → Honeycomb Events Batch API
[Path B: OTel Collector]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → OTel Collector → OTLP → Honeycomb
Why Honeycomb for file access logs? Because file access data is inherently high-cardinality: thousands of users × millions of file paths × dozens of operations × multiple SVMs. Traditional log tools force you to pre-aggregate or sample. Honeycomb lets you query the raw events at full resolution.
┌──────────────────────────────────────────────────────┐
│ Honeycomb Query Engine │
│ │
│ "Show me which users accessed /vol/finance/* │
│ between 2am-4am last Tuesday" │
│ │
│ → BubbleUp: auto-detect anomalous dimensions │
│ → Heatmap: visualize access density over time │
│ → GROUP BY user, path, operation — no pre-indexing │
│ │
│ 20M events/month FREE │
└──────────────────────────────────────────────────────┘
This is Part 10 of the Serverless Observability for FSx for ONTAP series.
Why Honeycomb for File Access Logs?
Most observability tools index a fixed set of fields. When you have high-cardinality dimensions — like file paths (/vol/data/project-alpha/2026/Q1/report-final-v3.docx) or Active Directory usernames — you hit index bloat, slow queries, or forced sampling.
Honeycomb's columnar storage handles this natively:
| Capability | Traditional Logs | Honeycomb |
|---|---|---|
| Query by arbitrary field | Pre-index or full scan | Instant (columnar) |
| GROUP BY high-cardinality field | Expensive / limited | Native |
| BubbleUp (anomaly detection) | Manual investigation | Semi-automatic (select time range, BubbleUp identifies differing dimensions) |
| Heatmap visualization | Requires pre-aggregation | Raw events |
For FSx for ONTAP audit logs, this means you can ask questions like:
- "Which users accessed the most files in the last hour?" (GROUP BY user)
- "What's different about the spike at 3am?" (BubbleUp)
- "Show me the access pattern heatmap for /vol/finance/" (Heatmap)
Architecture
┌─────────────────────────────────────────────────────────┐
│ Event Sources │
├─────────────────────────────────────────────────────────┤
│ │
│ EventBridge Scheduler │
│ rate(5 minutes) ──→ Lambda │
│ │ lists new files via │
│ │ S3 Access Point │
│ │ (checkpoint in SSM) │
│ ▼ │
│ Honeycomb Events Batch API │
│ (x-honeycomb-team header) │
│ │ │
│ EMS Webhook │ │
│ ──→ API GW ──→ Lambda ─────────────┤ │
│ (ems_handler) │ │
│ ▼ │
│ FPolicy Honeycomb │
│ ──→ ECS Fargate ──→ SQS (BubbleUp, │
│ ──→ Bridge Lambda Heatmap, │
│ ──→ EventBridge Explore) │
│ ──→ Lambda (fpolicy_handler) ──────────────────────────┤
└─────────────────────────────────────────────────────────┘
Two Verified Delivery Paths
Path A: Direct Events Batch API
Simplest path. Lambda sends events directly to Honeycomb's Events Batch API.
# Batch format
[
{
"time": "2026-01-15T12:00:00Z",
"data": {
"source": "fsxn-ontap",
"service": "ontap-audit",
"event_type": "4663",
"svm": "svm-prod-01",
"user": "admin@corp.local",
"operation": "ReadData",
"path": "/vol/data/file.txt",
"result": "Success",
"client_ip": "10.0.x.x"
}
}
]
Path B: OTel Collector (OTLP)
For multi-backend delivery or when you want enrichment/redaction in the pipeline. Verified in Part 5 with Honeycomb as one of the backends.
The OTel Collector uses the otlp_http exporter with x-honeycomb-dataset header:
exporters:
otlphttp/honeycomb:
endpoint: https://api.honeycomb.io
headers:
x-honeycomb-team: ${HONEYCOMB_API_KEY}
x-honeycomb-dataset: fsxn-audit
Quick Start (30 Minutes)
1. Get a Honeycomb Ingest Key
- Sign up at honeycomb.io (free tier: 20M events/month)
- Go to Account → Team Settings → API Keys
- Create an Ingest Key (starts with
hcaik_)
⚠️ Critical: You MUST use an Ingest Key (
hcaik_*). Environment Keys (hcxik_*) will be rejected.
2. Store Credentials
aws secretsmanager create-secret \
--name "honeycomb/fsxn-api-key" \
--secret-string '{"api_key":"hcaik_01abc..."}' \
--region ap-northeast-1
3. Deploy CloudFormation Stack
aws cloudformation deploy \
--template-file integrations/honeycomb/template.yaml \
--stack-name fsxn-honeycomb-integration \
--parameter-overrides \
S3AccessPointArn=arn:aws:s3:ap-northeast-1:123456789012:accesspoint/fsxn-audit-ap \
HoneycombApiKeySecretArn=arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:honeycomb/fsxn-api-key-XXXXXX \
HoneycombDataset=fsxn-audit \
S3BucketName=my-fsxn-audit-bucket \
--capabilities CAPABILITY_NAMED_IAM \
--region ap-northeast-1
4. Verify in Honeycomb
Navigate to your dataset → Explore Data:
WHERE service = "ontap-audit" | COUNT
Events should appear within seconds.
Honeycomb Query Examples
Basic Investigation
# All failed access attempts
WHERE result = "Failure" | GROUP BY user, path | COUNT
# Top 20 users by file access volume
GROUP BY user | COUNT | ORDER BY COUNT DESC | LIMIT 20
# Operations breakdown
GROUP BY operation | COUNT
High-Cardinality Analysis (Honeycomb's Strength)
# BubbleUp: What's different about the 3am spike?
# Select the spike in the time series → click BubbleUp
# Honeycomb auto-identifies which dimensions differ
# Heatmap: Access density by hour
WHERE operation = "ReadData" | HEATMAP(timestamp)
# Trace a specific user's activity
WHERE user = "admin@corp.local" | VISUALIZE COUNT | GROUP BY operation, path
# Find unusual path access patterns
GROUP BY path | COUNT | WHERE COUNT > 100
Security Investigation
# After-hours access to sensitive paths
WHERE path CONTAINS "confidential" AND hour(timestamp) NOT BETWEEN 9 AND 17
| GROUP BY user | COUNT
# Users accessing paths they haven't accessed before
# (Use Honeycomb's "compare to baseline" feature)
# Bulk file operations (potential exfiltration)
WHERE operation = "ReadData" | GROUP BY user | COUNT | WHERE COUNT > 1000
Event Schema (13 Fields)
All fields are queryable at full cardinality without pre-indexing:
| Field | Example | Cardinality |
|---|---|---|
source |
fsxn-ontap | Low |
service |
ontap-audit | Low |
event_type |
4663 | Low (~10 types) |
svm |
svm-prod-01 | Low (~5-20) |
user |
admin@corp.local | High (thousands) |
operation |
ReadData | Low (~10 types) |
path |
/vol/data/report.pdf | Very High (millions) |
result |
Success / Failure | Low (2) |
client_ip |
10.0.x.x | Medium (hundreds) |
s3_key |
audit/svm-prod-01/2026/... | Very High |
Cost Analysis
Honeycomb pricing is event-based, not volume-based:
| Monthly Log Volume | Estimated Events | Honeycomb Cost |
|---|---|---|
| 1 GB | ~500K events | Free (20M/month included) |
| 10 GB | ~5M events | Free |
| 30 GB | ~15M events | Free |
| 50 GB | ~25M events | Paid tier (~$100/month) |
| Component | Monthly Cost (10 GB/month) |
|---|---|
| Lambda (5-min polling) | ~$3 |
| EventBridge Scheduler | ~$1 |
| Secrets Manager | ~$1 |
| Honeycomb | Free (5M events < 20M limit) |
| Total | ~$5 |
The 20M events/month free tier covers most FSx for ONTAP deployments. Estimate ~500 events per MB of audit log data.
Gotchas & Lessons Learned
| # | Discovery | Impact |
|---|---|---|
| 1 |
Must use Ingest Key (hcaik_*) — Environment Key (hcxik_*) is silently rejected |
Events disappear without error if wrong key type |
| 2 | Events with timestamps older than ~4 hours are rejected | Test data must use current timestamps |
| 3 | 5MB max request body size; our implementation batches in chunks of 100 events for reliability | Lambda splits large files into multiple requests |
| 4 | Honeycomb processes data in US regions only | Evaluate cross-border data transfer requirements |
| 5 | Dataset auto-created on first event if it doesn't exist | No pre-provisioning needed |
| 6 | OTel Collector path requires x-honeycomb-dataset header |
Without it, events go to a default dataset |
Direct vs OTel Collector: When to Use Which
| Criteria | Direct (Path A) | OTel Collector (Path B) |
|---|---|---|
| Simplicity | ✅ Fewer components | More infrastructure |
| Multi-backend | ❌ Honeycomb only | ✅ Any OTLP backend |
| Enrichment/redaction | ❌ In Lambda only | ✅ Collector processors |
| Cost | Lower (no Collector) | Collector compute cost |
| Recommendation | Single-backend PoC | Production multi-backend |
Note from Honeycomb: Honeycomb recommends OTLP as the primary ingest path for new production deployments. The Events Batch API (Path A) remains fully supported and is simpler for single-backend PoCs. If you start with Path A, migrating to Path B (OTLP) requires no changes to your Honeycomb queries — only the delivery mechanism changes.
Production Readiness
This integration follows the project's Production Readiness Levels:
| Level | What You Get | Go/No-Go to Next |
|---|---|---|
| Level 1 (this Quick Start) | Audit poller + DLQ | Logs arrive, checkpoint advances, DLQ empty 24h |
| Level 2 | + Honeycomb queries + alerts | SLOs met 7 days, security review done |
| Level 3 | + DynamoDB ledger + poison-pill | SLOs met 30 days, compliance pack |
| Level 4 | + OTel Collector + redaction | Multi-backend, PII redaction, DR tested |
Data classification note: Honeycomb receives
userandpathfields which are classified as PII/sensitive. Since Honeycomb processes data in US regions only, evaluate cross-border transfer requirements. For PII-sensitive deployments, use the OTel Collector path (Path B) with redaction processors. See Data Classification Guide.
Full criteria: Pipeline SLO Definitions | DLQ Replay Runbook
CloudFormation Templates
| Template | Purpose | Key Parameters |
|---|---|---|
template.yaml |
FSx audit log poller | S3AccessPointArn, HoneycombApiKeySecretArn, HoneycombDataset |
template-ems.yaml |
EMS webhook handler | HoneycombApiKeySecretArn, HoneycombDataset |
template-fpolicy.yaml |
FPolicy EventBridge handler | HoneycombApiKeySecretArn, HoneycombDataset, EventBusName |
Resources
- GitHub: integrations/honeycomb/
- OTel Collector path: integrations/otel-collector/
- Honeycomb Docs: docs.honeycomb.io
- Honeycomb BubbleUp: BubbleUp Guide
- Series GitHub: github.com/Yoshiki0705/fsxn-observability-integrations
Series Navigation
- Part 1: Why Your FSx for ONTAP Logs Deserve Better
- Part 2: Shipping FSx for ONTAP Logs to Datadog — The Serverless Way
- Part 3: Event-Driven Ransomware Detection with ONTAP ARP + Datadog
- Part 4: FPolicy File Activity Pipeline — ONTAP to Datadog via ECS Fargate
- Part 5: Escape Vendor Lock-in with OTel Collector
- Part 6: Direct-to-Grafana: Shipping Logs via OTLP Gateway
- Part 7: Ship FSx for ONTAP Audit Logs to New Relic via Serverless Lambda Pipeline
- Part 8: EC2 to Serverless: Modernizing Splunk Integration
- Part 9: Data Sovereignty with Elastic
- Part 10: High-Cardinality File Access Analysis with Honeycomb (this post)
Questions about high-cardinality analysis or the Honeycomb integration? Drop a comment below.
GitHub: github.com/Yoshiki0705/fsxn-observability-integrations
Top comments (0)