DEV Community

Cover image for Designing an Audit Log That Actually Helps: Timestamps, Actions, and Forensics
HelperX
HelperX

Posted on

Designing an Audit Log That Actually Helps: Timestamps, Actions, and Forensics

Most SaaS audit logs are write-only graveyards — thousands of rows nobody reads until something goes wrong, at which point they don't contain the information you need.

We built HelperX's audit log to be the first thing operators check every morning. Not because we forced them — because the log actually tells them what they need to know. Here's how we designed it.

What goes into every log entry

Every automated action produces a log entry with this structure:

const logEntry = {
  id: crypto.randomUUID(),
  slotId: 'slot_abc123',
  module: 'reply_search',      // which module generated the action
  action: 'reply_sent',        // what happened
  target: '@username',          // who/what was acted upon
  targetTweetId: '1234567890',
  detail: 'Generated reply: "..."',
  status: 'success',           // success | failed | skipped | rate_limited
  duration_ms: 1240,           // how long the action took
  timestamp: '2026-06-07T14:23:17.442Z',
  metadata: {
    query: 'looking for project management tool',
    authorFollowers: 12400,
    replyLength: 47,
    promptVariant: 'B',
    dailyCount: 42,
    dailyCap: 100
  }
};
Enter fullscreen mode Exit fullscreen mode

Every field exists for a reason. Let's walk through the non-obvious ones.

status: 'skipped'

Most log systems only record successes and failures. We also record skips — cases where the system evaluated an action and decided not to take it.

14:23:15 [reply_search] SKIPPED @user123 — author below minimum followers (240 < 1000)
14:23:17 [reply_search] SKIPPED @bot_account — author in geo blacklist (RU)
14:23:19 [reply_search] SENT @real_user — reply: "Interesting approach to..."
Enter fullscreen mode Exit fullscreen mode

Skip logs answer the question "why isn't the module doing more?" Without them, operators see 30 replies in a day and wonder why it's not 100. The skips show that the filters are working — the system evaluated 200 tweets and correctly filtered down to 30 worth replying to.

duration_ms

How long each action took, including network round-trip. This is the first thing we check when diagnosing performance issues.

Normal:    reply_sent — 1,240ms
Slow:      reply_sent — 4,800ms  ← proxy latency issue
Very slow: reply_sent — 12,300ms ← possible rate limiting
Timeout:   reply_failed — 30,000ms ← proxy dead
Enter fullscreen mode Exit fullscreen mode

A sudden increase in average duration means something changed — the proxy is degrading, the API is throttling, or the AI model is responding slower.

metadata.dailyCount / dailyCap

Every action records where the account stands relative to its daily cap:

14:23:19 [reply_search] SENT — dailyCount: 42/100
14:25:03 [reply_search] SENT — dailyCount: 43/100
...
18:45:12 [reply_search] SENT — dailyCount: 100/100
18:47:00 [reply_search] CAP_REACHED — dailyCount: 100/100, next reset: 00:00 UTC
Enter fullscreen mode Exit fullscreen mode

Operators can see exactly when they hit their cap and plan accordingly. No guessing, no surprises.

Storage: SQLite, not Postgres

We store audit logs in SQLite — one database per slot. This is a deliberate choice.

function createAuditTable(db) {
  db.exec(`
    CREATE TABLE IF NOT EXISTS audit_log (
      id TEXT PRIMARY KEY,
      module TEXT NOT NULL,
      action TEXT NOT NULL,
      target TEXT,
      detail TEXT,
      status TEXT NOT NULL,
      duration_ms INTEGER,
      timestamp TEXT NOT NULL DEFAULT (datetime('now')),
      metadata TEXT
    )
  `);

  db.exec(`
    CREATE INDEX IF NOT EXISTS idx_audit_timestamp
    ON audit_log(timestamp DESC)
  `);

  db.exec(`
    CREATE INDEX IF NOT EXISTS idx_audit_module_status
    ON audit_log(module, status)
  `);
}
Enter fullscreen mode Exit fullscreen mode

Why SQLite per slot:

  1. Isolation. Each slot's data is physically separate. Deleting a slot deletes its log file. No WHERE slot_id = ? on every query.

  2. Performance. Writes are local, no network round-trip. At 100-300 log entries per day per slot, SQLite handles this without breaking a sweat.

  3. Portability. Need to debug a slot? Copy the .db file. Send it to someone. Open it in any SQLite tool.

  4. No shared state. Slot A's log can't accidentally leak into Slot B's queries. The database boundary enforces what the application logic promises.

Querying patterns

The log needs to answer specific questions quickly. Here are the queries we optimized for:

"What happened today?"

SELECT module, action, target, status, timestamp, detail
FROM audit_log
WHERE timestamp >= datetime('now', 'start of day')
ORDER BY timestamp DESC
LIMIT 200;
Enter fullscreen mode Exit fullscreen mode

"How many actions per module today?"

SELECT module,
       COUNT(*) FILTER (WHERE status = 'success') as sent,
       COUNT(*) FILTER (WHERE status = 'skipped') as skipped,
       COUNT(*) FILTER (WHERE status = 'failed') as failed
FROM audit_log
WHERE timestamp >= datetime('now', 'start of day')
GROUP BY module;
Enter fullscreen mode Exit fullscreen mode

"Are we getting rate-limited?"

SELECT date(timestamp) as day,
       COUNT(*) FILTER (WHERE status = 'rate_limited') as limited,
       COUNT(*) as total,
       ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'rate_limited') / COUNT(*), 1) as pct
FROM audit_log
WHERE timestamp >= datetime('now', '-7 days')
GROUP BY day
ORDER BY day;
Enter fullscreen mode Exit fullscreen mode

"What's the average response time trending?"

SELECT date(timestamp) as day,
       ROUND(AVG(duration_ms)) as avg_ms,
       MAX(duration_ms) as max_ms,
       MIN(duration_ms) as min_ms
FROM audit_log
WHERE status = 'success'
  AND timestamp >= datetime('now', '-7 days')
GROUP BY day;
Enter fullscreen mode Exit fullscreen mode

Retention and cleanup

Logs grow. At 200 entries/day, a slot accumulates 6,000 entries/month or 72,000/year. That's ~15MB in SQLite — manageable, but we still set boundaries.

function pruneAuditLog(db, retentionDays = 90) {
  const result = db.prepare(`
    DELETE FROM audit_log
    WHERE timestamp < datetime('now', '-' || ? || ' days')
  `).run(retentionDays);

  if (result.changes > 0) {
    db.exec('VACUUM');
  }

  return result.changes;
}
Enter fullscreen mode Exit fullscreen mode

90 days of retention. Enough for trend analysis and incident investigation. Old enough that storage is bounded.

We run pruning daily at the slot level — each slot manages its own lifecycle.

The dashboard view

Raw logs are useful for debugging. But the daily operator experience is a dashboard summary:

┌─────────────────────────────────────────┐
│ Slot: @tech_account      June 7, 2026  │
├─────────────────────────────────────────┤
│ Reply Search    42/100 sent  │ 12 skip  │
│ Reply List       8/50  sent  │  3 skip  │
│ Regular Post     2/2   sent  │  0 skip  │
│ Top Repost       1/5   sent  │  2 skip  │
│ Welcome DM       5/15  sent  │  1 skip  │
│ UnFollow         0/10  done  │  0 skip  │
├─────────────────────────────────────────┤
│ Avg response: 1.2s │ Rate limits: 0    │
│ Errors: 0          │ Cap resets: 00:00  │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

This is generated from the same audit log data. The operator sees the summary; the detail is one click away.

What makes a log useful vs useless

After building and iterating on this system for a year, here's what I've learned:

1. Log skips, not just actions. The absence of action is as informative as the action itself. Operators who understand why the system didn't do something trust it more.

2. Include resource consumption. dailyCount: 42/100 tells the operator their budget for the day. Without it, they're guessing.

3. Duration is a canary. Latency increases precede failures. If average duration_ms doubles, something is degrading — and you have time to fix it before it breaks.

4. One log per isolation boundary. Don't mix slots in one log table. The query complexity isn't worth the storage savings.

5. Make the log the default view. If operators have to navigate to a separate page to see the log, they won't check it. Put the summary on the dashboard. Make the detail expandable.

6. Prune automatically. Users won't manage retention. Set a default (90 days) and clean up silently.

The audit log isn't a compliance checkbox. It's the operator's dashboard. Design it that way and people will actually use it.


HelperX logs every automated action with timestamps, status, and metadata — viewable in real-time from the dashboard. Free 30-day trial.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.