DEV Community

Cover image for High-Cardinality File Access Analysis with Honeycomb + OTel

High-Cardinality File Access Analysis with Honeycomb + OTel

TL;DR

We built a serverless pipeline that ships FSx for ONTAP audit logs to Honeycomb, where its high-cardinality query engine turns file access data into actionable insights. Two delivery paths verified:

[Path A: Direct]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → Honeycomb Events Batch API

[Path B: OTel Collector]
FSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → OTel Collector → OTLP → Honeycomb
Enter fullscreen mode Exit fullscreen mode

Why Honeycomb for file access logs? Because file access data is inherently high-cardinality: thousands of users × millions of file paths × dozens of operations × multiple SVMs. Traditional log tools force you to pre-aggregate or sample. Honeycomb lets you query the raw events at full resolution.

┌──────────────────────────────────────────────────────┐
│  Honeycomb Query Engine                              │
│                                                      │
│  "Show me which users accessed /vol/finance/*        │
│   between 2am-4am last Tuesday"                      │
│                                                      │
│  → BubbleUp: auto-detect anomalous dimensions        │
│  → Heatmap: visualize access density over time       │
│  → GROUP BY user, path, operation — no pre-indexing  │
│                                                      │
│  20M events/month FREE                               │
└──────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

This is Part 10 of the Serverless Observability for FSx for ONTAP series.


Why Honeycomb for File Access Logs?

Most observability tools index a fixed set of fields. When you have high-cardinality dimensions — like file paths (/vol/data/project-alpha/2026/Q1/report-final-v3.docx) or Active Directory usernames — you hit index bloat, slow queries, or forced sampling.

Honeycomb's columnar storage handles this natively:

Capability Traditional Logs Honeycomb
Query by arbitrary field Pre-index or full scan Instant (columnar)
GROUP BY high-cardinality field Expensive / limited Native
BubbleUp (anomaly detection) Manual investigation Semi-automatic (select time range, BubbleUp identifies differing dimensions)
Heatmap visualization Requires pre-aggregation Raw events

For FSx for ONTAP audit logs, this means you can ask questions like:

  • "Which users accessed the most files in the last hour?" (GROUP BY user)
  • "What's different about the spike at 3am?" (BubbleUp)
  • "Show me the access pattern heatmap for /vol/finance/" (Heatmap)

Architecture

┌─────────────────────────────────────────────────────────┐
│ Event Sources                                           │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  EventBridge Scheduler                                  │
│  rate(5 minutes) ──→ Lambda                             │
│                       │ lists new files via             │
│                       │ S3 Access Point                 │
│                       │ (checkpoint in SSM)             │
│                       ▼                                 │
│              Honeycomb Events Batch API                 │
│              (x-honeycomb-team header)                  │
│                       │                                 │
│  EMS Webhook          │                                 │
│  ──→ API GW ──→ Lambda ─────────────┤                   │
│     (ems_handler)                   │                   │
│                                     ▼                   │
│  FPolicy                        Honeycomb               │
│  ──→ ECS Fargate ──→ SQS       (BubbleUp,               │
│  ──→ Bridge Lambda               Heatmap,               │
│  ──→ EventBridge                 Explore)               │
│  ──→ Lambda (fpolicy_handler) ──────────────────────────┤
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Two Verified Delivery Paths

Path A: Direct Events Batch API

Simplest path. Lambda sends events directly to Honeycomb's Events Batch API.

# Batch format
[
  {
    "time": "2026-01-15T12:00:00Z",
    "data": {
      "source": "fsxn-ontap",
      "service": "ontap-audit",
      "event_type": "4663",
      "svm": "svm-prod-01",
      "user": "admin@corp.local",
      "operation": "ReadData",
      "path": "/vol/data/file.txt",
      "result": "Success",
      "client_ip": "10.0.x.x"
    }
  }
]
Enter fullscreen mode Exit fullscreen mode

Path B: OTel Collector (OTLP)

For multi-backend delivery or when you want enrichment/redaction in the pipeline. Verified in Part 5 with Honeycomb as one of the backends.

The OTel Collector uses the otlp_http exporter with x-honeycomb-dataset header:

exporters:
  otlphttp/honeycomb:
    endpoint: https://api.honeycomb.io
    headers:
      x-honeycomb-team: ${HONEYCOMB_API_KEY}
      x-honeycomb-dataset: fsxn-audit
Enter fullscreen mode Exit fullscreen mode

Quick Start (30 Minutes)

1. Get a Honeycomb Ingest Key

  1. Sign up at honeycomb.io (free tier: 20M events/month)
  2. Go to AccountTeam SettingsAPI Keys
  3. Create an Ingest Key (starts with hcaik_)

⚠️ Critical: You MUST use an Ingest Key (hcaik_*). Environment Keys (hcxik_*) will be rejected.

2. Store Credentials

aws secretsmanager create-secret \
  --name "honeycomb/fsxn-api-key" \
  --secret-string '{"api_key":"hcaik_01abc..."}' \
  --region ap-northeast-1
Enter fullscreen mode Exit fullscreen mode

3. Deploy CloudFormation Stack

aws cloudformation deploy \
  --template-file integrations/honeycomb/template.yaml \
  --stack-name fsxn-honeycomb-integration \
  --parameter-overrides \
    S3AccessPointArn=arn:aws:s3:ap-northeast-1:123456789012:accesspoint/fsxn-audit-ap \
    HoneycombApiKeySecretArn=arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:honeycomb/fsxn-api-key-XXXXXX \
    HoneycombDataset=fsxn-audit \
    S3BucketName=my-fsxn-audit-bucket \
  --capabilities CAPABILITY_NAMED_IAM \
  --region ap-northeast-1
Enter fullscreen mode Exit fullscreen mode

4. Verify in Honeycomb

Navigate to your dataset → Explore Data:

WHERE service = "ontap-audit" | COUNT
Enter fullscreen mode Exit fullscreen mode

Events should appear within seconds.

Honeycomb Query Examples

Basic Investigation

# All failed access attempts
WHERE result = "Failure" | GROUP BY user, path | COUNT

# Top 20 users by file access volume
GROUP BY user | COUNT | ORDER BY COUNT DESC | LIMIT 20

# Operations breakdown
GROUP BY operation | COUNT
Enter fullscreen mode Exit fullscreen mode

High-Cardinality Analysis (Honeycomb's Strength)

# BubbleUp: What's different about the 3am spike?
# Select the spike in the time series → click BubbleUp
# Honeycomb auto-identifies which dimensions differ

# Heatmap: Access density by hour
WHERE operation = "ReadData" | HEATMAP(timestamp)

# Trace a specific user's activity
WHERE user = "admin@corp.local" | VISUALIZE COUNT | GROUP BY operation, path

# Find unusual path access patterns
GROUP BY path | COUNT | WHERE COUNT > 100
Enter fullscreen mode Exit fullscreen mode

Security Investigation

# After-hours access to sensitive paths
WHERE path CONTAINS "confidential" AND hour(timestamp) NOT BETWEEN 9 AND 17
| GROUP BY user | COUNT

# Users accessing paths they haven't accessed before
# (Use Honeycomb's "compare to baseline" feature)

# Bulk file operations (potential exfiltration)
WHERE operation = "ReadData" | GROUP BY user | COUNT | WHERE COUNT > 1000
Enter fullscreen mode Exit fullscreen mode

Event Schema (13 Fields)

All fields are queryable at full cardinality without pre-indexing:

Field Example Cardinality
source fsxn-ontap Low
service ontap-audit Low
event_type 4663 Low (~10 types)
svm svm-prod-01 Low (~5-20)
user admin@corp.local High (thousands)
operation ReadData Low (~10 types)
path /vol/data/report.pdf Very High (millions)
result Success / Failure Low (2)
client_ip 10.0.x.x Medium (hundreds)
s3_key audit/svm-prod-01/2026/... Very High

Cost Analysis

Honeycomb pricing is event-based, not volume-based:

Monthly Log Volume Estimated Events Honeycomb Cost
1 GB ~500K events Free (20M/month included)
10 GB ~5M events Free
30 GB ~15M events Free
50 GB ~25M events Paid tier (~$100/month)
Component Monthly Cost (10 GB/month)
Lambda (5-min polling) ~$3
EventBridge Scheduler ~$1
Secrets Manager ~$1
Honeycomb Free (5M events < 20M limit)
Total ~$5

The 20M events/month free tier covers most FSx for ONTAP deployments. Estimate ~500 events per MB of audit log data.

Gotchas & Lessons Learned

# Discovery Impact
1 Must use Ingest Key (hcaik_*) — Environment Key (hcxik_*) is silently rejected Events disappear without error if wrong key type
2 Events with timestamps older than ~4 hours are rejected Test data must use current timestamps
3 5MB max request body size; our implementation batches in chunks of 100 events for reliability Lambda splits large files into multiple requests
4 Honeycomb processes data in US regions only Evaluate cross-border data transfer requirements
5 Dataset auto-created on first event if it doesn't exist No pre-provisioning needed
6 OTel Collector path requires x-honeycomb-dataset header Without it, events go to a default dataset

Direct vs OTel Collector: When to Use Which

Criteria Direct (Path A) OTel Collector (Path B)
Simplicity ✅ Fewer components More infrastructure
Multi-backend ❌ Honeycomb only ✅ Any OTLP backend
Enrichment/redaction ❌ In Lambda only ✅ Collector processors
Cost Lower (no Collector) Collector compute cost
Recommendation Single-backend PoC Production multi-backend

Note from Honeycomb: Honeycomb recommends OTLP as the primary ingest path for new production deployments. The Events Batch API (Path A) remains fully supported and is simpler for single-backend PoCs. If you start with Path A, migrating to Path B (OTLP) requires no changes to your Honeycomb queries — only the delivery mechanism changes.

Production Readiness

This integration follows the project's Production Readiness Levels:

Level What You Get Go/No-Go to Next
Level 1 (this Quick Start) Audit poller + DLQ Logs arrive, checkpoint advances, DLQ empty 24h
Level 2 + Honeycomb queries + alerts SLOs met 7 days, security review done
Level 3 + DynamoDB ledger + poison-pill SLOs met 30 days, compliance pack
Level 4 + OTel Collector + redaction Multi-backend, PII redaction, DR tested

Data classification note: Honeycomb receives user and path fields which are classified as PII/sensitive. Since Honeycomb processes data in US regions only, evaluate cross-border transfer requirements. For PII-sensitive deployments, use the OTel Collector path (Path B) with redaction processors. See Data Classification Guide.

Full criteria: Pipeline SLO Definitions | DLQ Replay Runbook

CloudFormation Templates

Template Purpose Key Parameters
template.yaml FSx audit log poller S3AccessPointArn, HoneycombApiKeySecretArn, HoneycombDataset
template-ems.yaml EMS webhook handler HoneycombApiKeySecretArn, HoneycombDataset
template-fpolicy.yaml FPolicy EventBridge handler HoneycombApiKeySecretArn, HoneycombDataset, EventBusName

Resources

Series Navigation


Questions about high-cardinality analysis or the Honeycomb integration? Drop a comment below.

GitHub: github.com/Yoshiki0705/fsxn-observability-integrations

Top comments (0)