Build Log (April 8, 2026)
Today I implemented the first production-ready telemetry collectors for heka-insights-agent and wired them into the main polling loop.
What I built
- Added an optimized
CPUCollectorinsrc/collectors/cpu.py - Added a
MemoryCollectorinsrc/collectors/memory.py - Added a
DiskCollectorinsrc/collectors/disk.py - Wired all collectors into
src/main.pywith a shared loop - Added environment-based poll interval support via
CPU_POLL_INTERVAL_SECONDS - Added
python-dotenvinrequirements.txt
CPU collector design
I built CPU collection around psutil.cpu_times(...) snapshots and delta math (single source), instead of calling both cpu_percent and cpu_times_percent per cycle.
Key design points:
- No thread offloading (
to_thread) for this workload - First cycle is warm-up by design
- Supports
basicanddetailedoutput modes - Optional per-core output
- Uses
MonotonicTickerto keep fixed cadence without drift
Memory collector design
Memory collection is intentionally lightweight:
- One call each to
psutil.virtual_memory()andpsutil.swap_memory() -
basicmode returns compact key fields -
detailedmode returns full psutil fields - Raw byte values are preserved (server-side compute handles transformations)
Disk collector design
For disk, I chose cumulative I/O counters (not rates) because central compute is done server-side.
- Uses
psutil.disk_io_counters(perdisk=True) - Returns aggregate and per-disk counters
- Filters to physical devices only
- Excludes partitions from per-disk payload
- Added device-name cache with periodic refresh to reduce repeated filtering overhead
Main loop wiring
src/main.py now runs:
- CPU collector
- Memory collector
- Disk collector
All on the same interval, with separate log lines per collector.
Poll interval is loaded from .env via:
CPU_POLL_INTERVAL_SECONDS
Invalid values fall back safely to default 5.0s.
Profiling notes
I profiled a 120-second run and reviewed both process stats and cProfile output.
Key findings:
- Agent CPU cost is very low (near-idle for this polling interval)
- Max RSS is about 15 MB
- Runtime is dominated by intentional sleep (expected)
- Collector costs are small; disk collection is the heaviest of the three
What changed after profiling
Based on profile output, I optimized disk collection further:
- Added cached physical-device list to avoid filtering every cycle
- Kept output shape unchanged (
disk_io+disk_io_perdisk)
Current status
The agent now has a clean baseline telemetry pipeline with low overhead and clear extension points for transport/shipping.
Next planned work:
- Add payload shipping to backend endpoint
- Add bounded retry/backoff
- Add collector-focused tests
Repo URL
ronin1770
/
heka-insights-agent
A lightweight agent for collecting essential Linux system telemetry and shipping it to a configurable backend.
heka-insights-agent
A lightweight agent for collecting essential Linux system telemetry and shipping it to a configurable backend.
Test
Top comments (0)