Yoshiki Fujiwara(藤原善基)@AWS Community Builder for AWS Community Builders

Posted on May 31

High-Cardinality File Access Analysis with Honeycomb + OTel

#aws #honeycomb #observability #amazonfsxfornetappontap

TL;DR

We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data into actionable insights. Two delivery paths verified:

[Path A: Direct]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → Honeycomb Events Batch API

[Path B: OTel Collector]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → OTel Collector → OTLP → Honeycomb

Why Honeycomb for file access logs? Because file access data is inherently high-cardinality: thousands of users × millions of file paths × dozens of operations × multiple SVMs. Traditional log tools force you to pre-aggregate or sample. Honeycomb lets you query the raw events at full resolution.

┌──────────────────────────────────────────────────────┐
│  Honeycomb Query Engine                              │
│                                                      │
│  "Show me which users accessed /vol/finance/*        │
│   between 2am-4am last Tuesday"                      │
│                                                      │
│  → BubbleUp: auto-detect anomalous dimensions        │
│  → Heatmap: visualize access density over time       │
│  → GROUP BY user, path, operation — no pre-indexing  │
│                                                      │
│  20M events/month FREE                               │
└──────────────────────────────────────────────────────┘

This is Part 10 of the Serverless Observability for FSx for ONTAP series.

Why Honeycomb for File Access Logs?

Most observability tools index a fixed set of fields. When you have high-cardinality dimensions — like file paths (/vol/data/project-alpha/2026/Q1/report-final-v3.docx) or Active Directory usernames — you hit index bloat, slow queries, or forced sampling.

Honeycomb's columnar storage handles this natively:

Capability	Traditional Logs	Honeycomb
Query by arbitrary field	Pre-index or full scan	Instant (columnar)
GROUP BY high-cardinality field	Expensive / limited	Native
BubbleUp (anomaly detection)	Manual investigation	Semi-automatic (select time range, BubbleUp identifies differing dimensions)
Heatmap visualization	Requires pre-aggregation	Raw events

For FSx for ONTAP audit logs, this means you can ask questions like:

"Which users accessed the most files in the last hour?" (GROUP BY user)
"What's different about the spike at 3am?" (BubbleUp)
"Show me the access pattern heatmap for /vol/finance/" (Heatmap)

Architecture

┌─────────────────────────────────────────────────────────┐
│ Event Sources                                           │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  EventBridge Scheduler                                  │
│  rate(5 minutes) ──→ Lambda                             │
│                       │ lists new files via             │
│                       │ S3 Access Point                 │
│                       │ (checkpoint in SSM)             │
│                       ▼                                 │
│              Honeycomb Events Batch API                 │
│              (x-honeycomb-team header)                  │
│                       │                                 │
│  EMS Webhook          │                                 │
│  ──→ API GW ──→ Lambda ─────────────┤                   │
│     (ems_handler)                   │                   │
│                                     ▼                   │
│  FPolicy                        Honeycomb               │
│  ──→ ECS Fargate ──→ SQS       (BubbleUp,               │
│  ──→ Bridge Lambda               Heatmap,               │
│  ──→ EventBridge                 Explore)               │
│  ──→ Lambda (fpolicy_handler) ──────────────────────────┤
└─────────────────────────────────────────────────────────┘

Two Verified Delivery Paths

Path A: Direct Events Batch API

Simplest path. Lambda sends events directly to Honeycomb's Events Batch API.

# Batch format
[
  {
    "time": "2026-01-15T12:00:00Z",
    "data": {
      "source": "fsxn-ontap",
      "service": "ontap-audit",
      "event_type": "4663",
      "svm": "svm-prod-01",
      "user": "admin@corp.local",
      "operation": "ReadData",
      "path": "/vol/data/file.txt",
      "result": "Success",
      "client_ip": "10.0.x.x"
    }
  }
]

Path B: OTel Collector (OTLP)

For multi-backend delivery or when you want enrichment/redaction in the pipeline. Verified in Part 5 with Honeycomb as one of the backends.

The OTel Collector uses the otlp_http exporter with x-honeycomb-dataset header:

exporters:
  otlphttp/honeycomb:
    endpoint: https://api.honeycomb.io
    headers:
      x-honeycomb-team: ${HONEYCOMB_API_KEY}
      x-honeycomb-dataset: fsxn-audit

Quick Start (30 Minutes)

1. Get a Honeycomb Ingest Key

Sign up at honeycomb.io (free tier: 20M events/month)
Go to Account → Team Settings → API Keys
Create an Ingest Key (starts with hcaik_)

⚠️ Critical: You MUST use an Ingest Key (hcaik_*). Environment Keys (hcxik_*) will be rejected.

2. Store Credentials

aws secretsmanager create-secret \
  --name "honeycomb/fsxn-api-key" \
  --secret-string '{"api_key":"hcaik_01abc..."}' \
  --region ap-northeast-1

3. Deploy CloudFormation Stack

aws cloudformation deploy \
  --template-file integrations/honeycomb/template.yaml \
  --stack-name fsxn-honeycomb-integration \
  --parameter-overrides \
    S3AccessPointArn=arn:aws:s3:ap-northeast-1:123456789012:accesspoint/fsxn-audit-ap \
    HoneycombApiKeySecretArn=arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:honeycomb/fsxn-api-key-XXXXXX \
    HoneycombDataset=fsxn-audit \
    S3BucketName=my-fsxn-audit-bucket \
  --capabilities CAPABILITY_NAMED_IAM \
  --region ap-northeast-1

4. Verify in Honeycomb

Navigate to your dataset → Explore Data:

WHERE service = "ontap-audit" | COUNT

Events should appear within seconds.

Honeycomb Query Examples

Basic Investigation

# All failed access attempts
WHERE result = "Failure" | GROUP BY user, path | COUNT

# Top 20 users by file access volume
GROUP BY user | COUNT | ORDER BY COUNT DESC | LIMIT 20

# Operations breakdown
GROUP BY operation | COUNT

High-Cardinality Analysis (Honeycomb's Strength)

# BubbleUp: What's different about the 3am spike?
# Select the spike in the time series → click BubbleUp
# Honeycomb auto-identifies which dimensions differ

# Heatmap: Access density by hour
WHERE operation = "ReadData" | HEATMAP(timestamp)

# Trace a specific user's activity
WHERE user = "admin@corp.local" | VISUALIZE COUNT | GROUP BY operation, path

# Find unusual path access patterns
GROUP BY path | COUNT | WHERE COUNT > 100

Security Investigation

# After-hours access to sensitive paths
WHERE path CONTAINS "confidential" AND hour(timestamp) NOT BETWEEN 9 AND 17
| GROUP BY user | COUNT

# Users accessing paths they haven't accessed before
# (Use Honeycomb's "compare to baseline" feature)

# Bulk file operations (potential exfiltration)
WHERE operation = "ReadData" | GROUP BY user | COUNT | WHERE COUNT > 1000

Event Schema (13 Fields)

All fields are queryable at full cardinality without pre-indexing:

Field	Example	Cardinality
`source`	fsxn-ontap	Low
`service`	ontap-audit	Low
`event_type`	4663	Low (~10 types)
`svm`	svm-prod-01	Low (~5-20)
`user`	admin@corp.local	High (thousands)
`operation`	ReadData	Low (~10 types)
`path`	/vol/data/report.pdf	Very High (millions)
`result`	Success / Failure	Low (2)
`client_ip`	10.0.x.x	Medium (hundreds)
`s3_key`	audit/svm-prod-01/2026/...	Very High

Cost Analysis

Honeycomb pricing is event-based, not volume-based:

Monthly Log Volume	Estimated Events	Honeycomb Cost
1 GB	~500K events	Free (20M/month included)
10 GB	~5M events	Free
30 GB	~15M events	Free
50 GB	~25M events	Paid tier (~$100/month)

Component	Monthly Cost (10 GB/month)
Lambda (5-min polling)	~$3
EventBridge Scheduler	~$1
Secrets Manager	~$1
Honeycomb	Free (5M events < 20M limit)
Total	~$5

The 20M events/month free tier covers most FSx for ONTAP deployments. Estimate ~500 events per MB of audit log data.

Gotchas & Lessons Learned

#	Discovery	Impact
1	*Must use Ingest Key (`hcaik_`)** — Environment Key (`hcxik_*`) is silently rejected	Events disappear without error if wrong key type
2	Events with timestamps older than ~4 hours are rejected	Test data must use current timestamps
3	5MB max request body size; our implementation batches in chunks of 100 events for reliability	Lambda splits large files into multiple requests
4	Honeycomb processes data in US regions only	Evaluate cross-border data transfer requirements
5	Dataset auto-created on first event if it doesn't exist	No pre-provisioning needed
6	OTel Collector path requires `x-honeycomb-dataset` header	Without it, events go to a default dataset

Direct vs OTel Collector: When to Use Which

Criteria	Direct (Path A)	OTel Collector (Path B)
Simplicity	✅ Fewer components	More infrastructure
Multi-backend	❌ Honeycomb only	✅ Any OTLP backend
Enrichment/redaction	❌ In Lambda only	✅ Collector processors
Cost	Lower (no Collector)	Collector compute cost
Recommendation	Single-backend PoC	Production multi-backend

Note from Honeycomb: Honeycomb recommends OTLP as the primary ingest path for new production deployments. The Events Batch API (Path A) remains fully supported and is simpler for single-backend PoCs. If you start with Path A, migrating to Path B (OTLP) requires no changes to your Honeycomb queries — only the delivery mechanism changes.

Production Readiness

This integration follows the project's Production Readiness Levels:

Level	What You Get	Go/No-Go to Next
Level 1 (this Quick Start)	Audit poller + DLQ	Logs arrive, checkpoint advances, DLQ empty 24h
Level 2	+ Honeycomb queries + alerts	SLOs met 7 days, security review done
Level 3	+ DynamoDB ledger + poison-pill	SLOs met 30 days, compliance pack
Level 4	+ OTel Collector + redaction	Multi-backend, PII redaction, DR tested

Data classification note: Honeycomb receives user and path fields which are classified as PII/sensitive. Since Honeycomb processes data in US regions only, evaluate cross-border transfer requirements. For PII-sensitive deployments, use the OTel Collector path (Path B) with redaction processors. See Data Classification Guide.

Full criteria: Pipeline SLO Definitions | DLQ Replay Runbook

CloudFormation Templates

Template	Purpose	Key Parameters
`template.yaml`	FSx audit log poller	S3AccessPointArn, HoneycombApiKeySecretArn, HoneycombDataset
`template-ems.yaml`	EMS webhook handler	HoneycombApiKeySecretArn, HoneycombDataset
`template-fpolicy.yaml`	FPolicy EventBridge handler	HoneycombApiKeySecretArn, HoneycombDataset, EventBusName

Resources

GitHub: integrations/honeycomb/
OTel Collector path: integrations/otel-collector/
Honeycomb Docs: docs.honeycomb.io
Honeycomb BubbleUp: BubbleUp Guide
Series GitHub: github.com/Yoshiki0705/fsxn-observability-integrations

Series Navigation

Part 1: Why Your FSx for ONTAP Logs Deserve Better
Part 2: Shipping FSx for ONTAP Logs to Datadog — The Serverless Way
Part 3: Event-Driven Ransomware Detection with ONTAP ARP + Datadog
Part 4: FPolicy File Activity Pipeline — ONTAP to Datadog via ECS Fargate
Part 5: Escape Vendor Lock-in: Multi-Backend Log Delivery with OTel Collector for FSx for ONTAP.
Part 6: Direct-to-Grafana: Shipping FSx for ONTAP Logs to Grafana Cloud Loki via OTLP Gateway
Part 7: Ship FSx for ONTAP Audit Logs to New Relic via Serverless Lambda Pipeline
Part 8: EC2 to Serverless: Modernizing FSx for ONTAP Splunk Integration
Part 9: Data Sovereignty: FSx for ONTAP Logs in Your VPC with Elastic
Part 10: High-Cardinality File Access Analysis with Honeycomb (this post)
Part 11: AI-Powered Root Cause: Correlating File Access with APM via Dynatrace
Part 12: FSx for ONTAP Audit Logs with Data Residency in your region with Sumo Logic
Part 13: 9 Vendors, One Architecture

Questions about high-cardinality analysis or the Honeycomb integration? Drop a comment below.

GitHub: github.com/Yoshiki0705/fsxn-observability-integrations

DEV Community