DEV Community: Farhan Munir

Heka-Insights-Agent: Milestone 6 Complete: Datadog OTLP + Native Integration Paths

Farhan Munir — Sun, 17 May 2026 06:04:48 +0000

Milestone 6 is complete for the Heka Insights Agent.

Repository: https://github.com/ronin1770/heka-insights-agent

In this milestone, we implemented two Datadog-compatible delivery paths so teams can choose what fits their needs:

Datadog OTLP preset mode (EXPORTER_TYPE=datadog_otlp)
Datadog native API mode (EXPORTER_TYPE=datadog_native)

Milestone 6 Goal

Provide Datadog integration through both:

OTLP-based delivery for portability
Datadog-native delivery for backend-specific control

In common terms: this means your agent can now send metrics to Datadog in either a standards-oriented way (OTLP) or a Datadog-specific way (native API), without changing your collectors.

What We Implemented

1. Datadog OTLP preset mode (`datadog_otlp`)

We added a Datadog preset resolver that derives endpoint + auth headers from Datadog config.

endpoint is derived from site: https://otlp.<DATADOG_SITE>/v1/metrics
auth header is injected: dd-api-key: <DATADOG_API_KEY>
optional Datadog hostname/tags are mapped into resource attributes

2. Datadog native exporter (`datadog_native`)

We implemented a native Datadog metrics exporter for POST /api/v1/series.

endpoint is derived from site: https://api.<DATADOG_SITE>/api/v1/series
canonical gauge -> Datadog gauge
canonical counter -> Datadog count
timestamp_unix_ms -> Unix seconds
count.interval is derived from CPU_POLL_INTERVAL_SECONDS

3. Validation hardening (M6-3)

We fail fast on invalid/missing Datadog config:

DATADOG_SITE must be an allowed full domain
DATADOG_API_KEY must be non-empty
DATADOG_TAGS must be strict key:value format

4. Deterministic mapping rules (M6-4)

We made mapping predictable:

DATADOG_HOSTNAME overrides label-derived host
tag conflicts are deterministic (DATADOG_TAGS override label tags by key)
metric prefixing is idempotent (won't double-prefix)

5. OTLP vs native comparison docs (M6-5)

We added side-by-side docs so teams can quickly decide mode based on:

transport style
portability
mapping behavior
counter interval semantics

6. Docker-based milestone-6 integration tests (M6-6)

We added dedicated tests under:

tests/milestone-6/test_datadog_live_integration.py

These are intentionally gated and only run when explicitly enabled.

Key Files Added/Updated

src/config/runtime.py
src/config/__init__.py
src/exporters/factory.py
src/exporters/datadog_native.py
tests/test_config_otlp_env.py
tests/test_datadog_exporters.py
tests/milestone-6/test_datadog_live_integration.py
docs/configuration.md
docs/architecture.md
docs/development.md
README.md

Implementation Strategy

We followed the existing exporter architecture:

Keep collectors unchanged
Extend runtime config with strict validation and preset resolution
Reuse OTLP HTTP exporter for Datadog OTLP mode
Add a dedicated native exporter for Datadog API v1 series
Add deterministic mapping rules + comprehensive tests
Add Docker-backed live tests in a milestone-specific folder

Test Commands and Outputs

Local Datadog-focused test suite

PYTHONPATH=src python3 -m unittest -v \
  tests.test_config_otlp_env \
  tests.test_datadog_exporters

Example output:

Ran 21 tests in 0.008s

OK

Full local suite

PYTHONPATH=src python3 -m unittest discover -s tests -v

Example output:

Ran 42 tests in 0.014s

OK (skipped=1)

Docker live Datadog integration tests (milestone-6)

docker compose -f docker-compose.test.yml run --rm \
  -e RUN_OTLP_INTEGRATION=1 \
  -e RUN_DATADOG_LIVE_INTEGRATION=1 \
  -e DATADOG_SITE=us5.datadoghq.com \
  -e DATADOG_API_KEY=<REDACTED_DATADOG_API_KEY> \
  test-runner \
  pytest -vv -s -rs tests/milestone-6/test_datadog_live_integration.py

Observed output:

collected 2 items

tests/milestone-6/test_datadog_live_integration.py::DatadogLiveIntegrationTests::test_datadog_native_exports_gauge_and_count_metrics PASSED
tests/milestone-6/test_datadog_live_integration.py::DatadogLiveIntegrationTests::test_datadog_otlp_preset_exports_gauge_metric PASSED

2 passed in 2.63s

Why This Matters

With Milestone 6 complete, Datadog users can now choose the integration path that matches their operations model:

choose OTLP preset for standards-aligned portability
choose native mode for Datadog-specific control and semantics

Both paths are now validated, tested, and documented.

Contribute

If you're interested in observability agents, OTLP pipelines, or exporter design, contributions are welcome:

https://github.com/ronin1770/heka-insights-agent

Useful contribution areas:

additional Datadog live integration coverage
CI automation for gated integration scenarios
docs/examples for production deployments
future milestone reliability and operability improvements

Milestone 5 Complete: New Relic OTLP Integration for Heka Insights Agent

Farhan Munir — Tue, 12 May 2026 05:33:12 +0000

In this milestone, we focused on making New Relic integration first-class in the Heka Insights Agent, while still using the same OTLP HTTP exporter foundation.

Repository: https://github.com/ronin1770/heka-insights-agent

Milestone 5 Goal

Milestone 5 was about shipping a reliable, low-friction New Relic path without creating a separate proprietary exporter.

Scope covered:

New Relic preset configuration layer
Automatic New Relic auth header injection
Endpoint and required field validation
Documentation updates with practical examples
Integration tests for preset behavior

What We Changed

1. New Relic preset mode (`EXPORTER_TYPE=newrelic_otlp`)

We added a preset resolver that maps New Relic-specific environment variables into OTLP HTTP exporter inputs.

Required variables:

NEWRELIC_OTLP_ENDPOINT
NEWRELIC_API_KEY
NEWRELIC_SERVICE_NAME

Optional variables:

NEWRELIC_ENVIRONMENT
NEWRELIC_HOST_NAME

2. Automatic auth header injection

When EXPORTER_TYPE=newrelic_otlp is selected, the runtime now injects:

api-key: <NEWRELIC_API_KEY>

No manual OTLP_HTTP_HEADERS setup is required for baseline New Relic auth.

3. Precedence behavior

In preset mode, NEWRELIC_* values take precedence over conflicting generic OTLP_* values.

Examples:

NEWRELIC_API_KEY overrides any conflicting OTLP_HTTP_HEADERS api-key
NEWRELIC_SERVICE_NAME overrides service.name from OTLP_RESOURCE_ATTRIBUTES

4. Validation hardening

We now fail fast on invalid or missing required New Relic settings:

missing required keys -> startup error
invalid endpoint format -> startup error (must be absolute http:// or https://)

5. Documentation updates

We updated project docs and examples to include New Relic preset usage and expected test flows.

Commands to Run Tests

Unit tests for config behavior

PYTHONPATH=src python3 -m unittest -v tests.test_config_otlp_env

New Relic Docker integration tests (explicit)

docker compose -f docker-compose.test.yml run --rm \
  -e RUN_OTLP_INTEGRATION=1 \
  -e OTLP_IT_HOST=host.docker.internal \
  test-runner \
  pytest -vv -s -rs tests/milestone-5/test_newrelic_otlp_integration.py

Expected summary:

collected 3 items
...
3 passed

Full OTLP/HTTP Docker test stack

docker compose -f docker-compose.test.yml up --build --abort-on-container-exit --exit-code-from test-runner

Why This Matters

With Milestone 5 complete, teams can onboard New Relic using a predictable preset path:

less config ambiguity
safer startup behavior through validation
standardized OTLP transport path
stronger confidence through Docker-backed integration tests

Next steps will continue building on this exporter foundation while keeping the runtime predictable and vendor-friendly.

Milestone 4 Complete — OTLP HTTP Exporter for Heka Insights Agent

Farhan Munir — Wed, 29 Apr 2026 06:49:27 +0000

In this milestone, I completed the OTLP HTTP exporter implementation for Heka Insights Agent and validated it with both unit tests and Docker-backed integration tests.

Repo: https://github.com/ronin1770/heka-insights-agent

What was implemented in Milestone 4

Milestone 4 includes:

OTLP HTTP exporter wiring
OTLP metric payload mapping
OTLP auth header support
OTLP resource attribute mapping
Retry/backoff behavior for transient failures
Unit test coverage for config + sender + exporter + mapping
Docker integration tests against a real OpenTelemetry Collector

Configuration added

OTLP configuration is environment-driven through .env:

LOG_LOCATION=./log/heka_agent.log
CPU_POLL_INTERVAL_SECONDS=10
EXPORTER_TYPE=otlp_http
OTLP_HTTP_ENDPOINT=http://localhost:4318/v1/metrics
OTLP_HTTP_HEADERS=key=Bearer abcd1234
OTLP_RESOURCE_ATTRIBUTES=service.name=heka-insights-agent,host.name=localhost
OTLP_HTTP_TIMEOUT_SECONDS=10
OTLP_HTTP_RETRY_MAX_ATTEMPTS=5
OTLP_HTTP_RETRY_INITIAL_BACKOFF_SECONDS=1
OTLP_HTTP_RETRY_MAX_BACKOFF_SECONDS=5

Notes

OTLP_HTTP_HEADERS format: key=value,key2=value2
OTLP_RESOURCE_ATTRIBUTES format: key=value,key2=value2
Retryable failures:
- transport errors
- HTTP 408, 429, and 5xx
Non-retryable failures:
- HTTP 400, 401, 403, 404, and similar client-side errors

Unit tests

Unit tests cover:

OTLP env parsing and validation
OTLP payload mapping
OTLP HTTP sender behavior
Retry/backoff behavior
Exporter initialization and wiring

Run unit tests:

PYTHONPATH=src python3 -m unittest discover -s tests -v

Unit test output

Ran 27 tests in 0.008s

OK

Integration tests (Docker + real collector)

Integration tests validate:

auth success (200)
auth reject path (401) and no retry on non-retryable status
no-auth collector path

Run OTLP integration tests:

RUN_OTLP_INTEGRATION=1 PYTHONPATH=src python3 -m unittest -v tests.test_otlp_http_integration

Integration test output

test_auth_rejected_without_retry_for_401 ... ok
test_auth_success_with_bearer_header ... ok
test_no_auth_collector_accepts_without_headers ... ok

----------------------------------------------------------------------
Ran 3 tests in 2.417s

OK

Collector config used in tests

The project includes collector fixtures under tests/fixtures/otlp/, including auth-required and no-auth variants for scenario testing.

Final result

Milestone 4 now has:

working OTLP HTTP exporter
production-style config controls
retry/backoff logic for transient failures
unit + integration test coverage
updated docs and run instructions

Repo again: https://github.com/ronin1770/heka-insights-agent

Milestone 4 (Part 1): Implementing OTLP HTTP Core in Heka Insights Agent (M4-1, M4-2)

Farhan Munir — Mon, 27 Apr 2026 09:57:30 +0000

Milestone 4 (Part 1): Implementing OTLP HTTP Core in Heka Insights Agent (M4-1, M4-2)

Heka Insights Agent already had a canonical metrics pipeline from Milestone 3.

In this part of Milestone 4, I implemented the OTLP HTTP core in two focused steps:

M4-1: Canonical metrics -> OTLP payload mapping layer
M4-2: OTLP HTTP request sender and exporter wiring

This post covers only these two items. Auth headers, resource attributes, retry/compression controls are intentionally deferred to later M4 tasks.

Why This Split Matters

By separating mapping from transport, we get:

stable internal metric model
explicit OTLP payload construction
transport logic that can evolve independently
clean foundation for New Relic/Datadog-style OTLP integrations later

What Was Implemented

M4-1: OTLP Payload Mapping Layer

I added a dedicated mapper that converts canonical metric records into OTLP HTTP JSON payloads.

Core behavior:

validates required canonical fields before send
supports explicit type mapping:
gauge -> OTLP gauge.dataPoints
counter -> OTLP sum.dataPoints with cumulative temporality
maps canonical labels to OTLP metric attributes
maps timestamp_unix_ms to OTLP timeUnixNano
rejects malformed metrics early with explicit errors

Result: malformed payloads are blocked before network transport.

M4-2: OTLP HTTP Sender + Exporter

I added OTLP HTTP sender/exporter flow and wired it into exporter selection.

Core behavior:

EXPORTER_TYPE=otlp_http now creates OTLP exporter
validates OTLP endpoint format (http/https absolute URL) at startup
fails fast when endpoint is missing/invalid
sends JSON payload via HTTP POST
treats only 2xx responses as success
raises explicit errors for HTTP failures and transport errors

Result: working end-to-end OTLP HTTP delivery with fail-fast startup safety.

Local Test Setup with OpenTelemetry Collector (Docker)

I used OTel Collector debug exporter to validate incoming metrics.

Collector config (`otel-collector-config.yaml`)

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  debug:

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [debug]

Run collector

docker run --rm \
  -p 4318:4318 \
  -v "$(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml" \
  otel/opentelemetry-collector:latest \
  --config=/etc/otelcol/config.yaml

Agent `.env` for this test

LOG_LOCATION=./log/heka_agent.log
CPU_POLL_INTERVAL_SECONDS=10
EXPORTER_TYPE=otlp_http
OTLP_HTTP_ENDPOINT=http://localhost:4318/v1/metrics

Run agent

python src/main.py

Verification Signals

From runtime behavior:

agent starts with exporter_type=otlp_http
collector logs periodic metric batches every ~10 seconds
no exporter exceptions during dispatch
first cycle has fewer points due to CPU warm-up, then normalizes

Example collector signal:

resource metrics: 1
metrics: 24
data points: 24

Tests Added

I added focused tests for M4-1/M4-2:

payload mapping correctness (gauge/counter, labels, timestamps)
validation failures for malformed canonical metrics
HTTP sender request behavior and error handling
exporter wiring and missing-endpoint startup failure

All tests pass:

PYTHONPATH=src python3 -m unittest discover -s tests -v

What Is Intentionally Not Included Yet

Deferred to later M4 items:

auth headers (M4-3)
resource attribute mapping (M4-4)
timeout/compression/retry controls (M4-5)
broader OTLP docs and expanded test matrix (M4-6, M4-7)

Closing

M4-1 and M4-2 establish the OTLP core path: canonical metrics are now mapped deterministically and sent over HTTP with fail-fast validation.

This gives a production-friendly base to layer auth, resource metadata, and resiliency controls next.

Repo URL: https://github.com/ronin1770/heka-insights-agent

Heka-Insights-Agent: Milestone M3-4: Fail-Fast Exporter Validation at Startup

Farhan Munir — Fri, 24 Apr 2026 06:30:06 +0000

Milestone M3-4: Fail-Fast Exporter Validation at Startup

On April 24, 2026, I completed M3-4 in heka-insights-agent:

M3-4: Add configuration validation for exporter settings

This milestone closed a critical gap in the exporter foundation by removing silent fallback behavior and enforcing explicit startup failures for invalid exporter configuration.

Context

In Milestone M3-3, we routed output through the exporter interface and lifecycle:

initialize()
export(metrics)
shutdown()

That established the architecture.

M3-4 focused on correctness and operational safety: startup must fail early when exporter configuration is invalid or points to an unimplemented adapter.

The Problem Before M3-4

Prior behavior was permissive:

invalid EXPORTER_TYPE values defaulted back to console
unimplemented exporter types also downgraded to console

Why this was risky:

misconfiguration could go unnoticed in production
users could think they were exporting to one backend while actually exporting to console
behavior violated milestone acceptance criteria requiring explicit errors

M3-4 Goals

The implementation targeted three concrete outcomes:

invalid EXPORTER_TYPE must fail fast with a clear startup error
configured but unimplemented exporter adapters must fail fast with a clear startup error
docs must reflect strict validation (remove pre-M3-4 fallback messaging)

Implementation Summary

1) Strict validation in runtime config

Updated src/config/runtime.py in get_exporter_type(...).

Behavior now:

missing EXPORTER_TYPE still defaults to console
unsupported value now raises RuntimeError with explicit supported values list
optional logger records an error message before raising

This changed exporter selection from “best-effort fallback” to deterministic validation.

2) Removed fallback from exporter factory

Updated src/exporters/factory.py in create_exporter(...).

Behavior now:

console returns ConsoleExporter
other configured values currently raise RuntimeError because adapters are not implemented yet

This is intentional. It prevents false confidence and makes readiness of each exporter explicit.

3) Documentation updated to M3-4 semantics

Updated:

docs/configuration.md
README.md

Both now state that:

unsupported exporter values fail fast
unimplemented exporters fail fast
only missing value defaults to console

This removed the “fallback to console with warning” wording from the pre-M3-4 behavior.

Validation Performed

Validation included compile checks and runtime behavior checks.

Compile validation

python3 -m compileall src passed

Behavior validation

Tested startup resolution paths for:

missing EXPORTER_TYPE
EXPORTER_TYPE=console
EXPORTER_TYPE=invalid_value
EXPORTER_TYPE=otlp_http (declared but not implemented adapter)

Observed results:

missing value -> resolves to console
console -> exporter creation succeeds
invalid_value -> immediate RuntimeError with supported-values message
otlp_http -> immediate RuntimeError stating exporter not implemented

These outcomes align with M3-4 requirements.

Why this change matters operationally

Fail-fast startup validation improves reliability in real deployments:

misconfigurations are caught immediately
no hidden routing to fallback output
deployment behavior is explicit and auditable
future exporter rollout can be gated by implementation readiness

This is especially important when teams automate deployments and rely on env-based configuration.

Architectural Impact

After M3-4, exporter behavior is now strictly layered:

config validates selector
factory enforces implementation availability
runtime starts only on valid+implemented exporter path

That gives us a clean foundation for adding real transports (otlp_http, datadog_native, newrelic_otlp) without changing collector logic.

What’s Next

The natural next step after M3-4 is M3-5:

document exporter lifecycle and responsibilities in architecture docs
include startup validation expectations as part of operational guidance
capture adapter implementation contract for future backend integrations

M3-4 makes sure the system fails loudly when exporter config is wrong, which is exactly what a transport foundation should do before adding real outbound integrations.

Milestone M3-3: Refactoring Console Output Into an Exporter Pathway

Farhan Munir — Fri, 24 Apr 2026 05:49:23 +0000

Milestone M3-3: Refactoring Console Output Into an Exporter Pathway

On April 24, 2026, I completed M3-3 for heka-insights-agent: moving console output from direct printing in the main loop to a proper exporter lifecycle.

This was part of Milestone #3 (Transport And Exporter Foundation) and specifically targeted:

M3-3: Refactor console output to use exporter interface

The result is a cleaner delivery architecture where collectors remain untouched and output transport is now pluggable.

Why M3-3 mattered

Before this change, the runtime loop handled collection, formatting, and printing in one place. That worked for a console-only stage, but it tightly coupled runtime behavior to one output path.

For Milestone #3, the architecture needs to support:

a canonical metric model
reusable formatters
exporter lifecycle boundaries
future transport adapters

M3-3 was the bridge from “it prints metrics” to “it exports metrics through a contract.”

What the code looked like before

src/main.py previously:

collected CPU/memory/disk payloads
formatted them with PrometheusFormatter.format(...)
called print(prometheus_output, end="", flush=True) directly

That meant no exporter-owned lifecycle (initialize, shutdown), and no central place to swap output behavior.

Design goals for this refactor

The implementation targeted four practical goals:

Keep collectors unchanged.
Remove direct console printing from main.py.
Route output through Exporter.initialize(), Exporter.export(...), and Exporter.shutdown().
Preserve current Prometheus console output shape for backward compatibility.

What was implemented

1) Console exporter implementation

Created src/exporters/console.py with ConsoleExporter(Exporter).

Responsibilities:

initialize(): prepare exporter state
export(metrics): format and emit canonical metrics to stdout
shutdown(): close lifecycle cleanly

ConsoleExporter now owns console writes. main.py no longer writes metrics directly.

2) Exporter factory wiring

Created src/exporters/factory.py with:

create_exporter(exporter_type, logger=...) -> Exporter

Current behavior:

returns ConsoleExporter for console
falls back to ConsoleExporter for unimplemented exporter types with a warning

This keeps runtime deterministic while other exporters are still pending.

3) Canonical metric normalization pipeline

Created:

src/pipeline/canonical_metrics.py
src/pipeline/__init__.py

Added build_canonical_metrics(payloads, timestamp_unix_ms=...) to map collector payloads into canonical records.

Canonical record fields:

name
description
type
unit
value
labels
optional timestamp_unix_ms

This separated normalization from transport and moved us closer to the milestone’s delivery pipeline model.

4) Formatter support for canonical metrics

Extended src/formatters/prometheus.py with:

format_canonical(metrics)

This lets formatters operate on canonical data (not collector-specific payload dictionaries) while preserving the existing Prometheus text exposition output.

5) Main loop refactor to exporter lifecycle

Updated src/main.py so startup and runtime flow is now:

resolve EXPORTER_TYPE
create_exporter(...)
exporter.initialize()
collect payloads each cycle
normalize to canonical metrics
exporter.export(canonical_metrics)
exporter.shutdown() in finally

Direct print(...) of telemetry output was removed from main.py.

6) Python 3.10 typing compatibility fix

src/exporters/base.py originally used typing.NotRequired, which is not available in Python 3.10’s typing module.

The CanonicalMetric TypedDict was adjusted to a Python-3.10-safe form using:

required base TypedDict
total=False extension for optional timestamp

This kept type intent intact without requiring runtime upgrades.

Validation and runtime output

Validation performed:

python3 -m compileall src
smoke execution path for exporter creation + canonical conversion + console export

Observed runtime behavior after refactor:

collector logs still emitted
Prometheus lines still emitted each cycle
timestamps populated in Unix milliseconds
disk metrics emitted as aggregate and per-device series

In other words: behavior stayed stable, but ownership moved to exporter architecture.

What M3-3 achieved (and what it didn’t)

Completed in M3-3:

console output now runs through exporter interface
runtime flow is wired to exporter lifecycle
normalization and formatting layers are separated from collection

Not part of M3-3 (handled in later items):

strict fail-fast invalid exporter value handling (M3-4)
additional backend exporters (Datadog, OTLP HTTP, New Relic)
retry/buffering semantics beyond base hooks

Key lesson

This milestone was less about adding features and more about installing architectural seams.

The code now has explicit boundaries:

collectors collect
pipeline normalizes
formatter renders
exporter delivers

That separation is what will let future backend integrations land without rewriting collectors.

Next up

The immediate next step is M3-4: enforce strict startup validation for unsupported EXPORTER_TYPE values so runtime fails fast with explicit errors instead of warning + fallback.

Next Milestones for Heka Insights Agent: From Console Output to Real Telemetry Delivery

Farhan Munir — Tue, 21 Apr 2026 08:48:33 +0000

We’re moving beyond console-only output and into real telemetry delivery.

Over the next few milestones, heka-insights-agent will introduce a proper exporter layer, OTLP HTTP support, and first-class integrations for New Relic and Datadog. The focus is to keep the agent vendor-agnostic at its core while making it easier to route system metrics into modern observability platforms using clean, configurable dispatch patterns.

Project board: https://github.com/users/ronin1770/projects/3/views/1

python #observability #opentelemetry #devops #monitoring #prometheus #datadog #newrelic #opensource

✅ Milestone Completed: Prometheus Data Format Integration

Farhan Munir — Sun, 19 Apr 2026 07:36:58 +0000

✅ Milestone Completed: Prometheus Data Format Integration

This sprint focused on making telemetry output fully Prometheus-compatible while preserving our existing collector pipeline for CPU, memory, and disk I/O.

What I implemented

Added Prometheus text exposition format (v0.0.4) output
Included # HELP and # TYPE metadata
Supported gauge and counter metric families
Added label-based dimensions (CPU modes, disk devices)
Kept output deterministic and scrape-ready for a /metrics style endpoint

Metrics now covered

CPU usage and per-mode CPU time distribution
Virtual memory: used, available, total
Swap memory: used, total
Disk I/O bytes: read/write (aggregate + per-device)
Disk I/O operations: read/write (aggregate + per-device)

Validation results

Output verified against Prometheus exposition rules
No syntax violations found
Metrics are parseable and scrape-ready

Prior milestone context (OpenMetrics)

Before this, I shipped OpenMetrics-aligned output for the same core Linux host metrics with:

# HELP, # TYPE, # UNIT metadata
# EOF termination
Valid CPU, memory, and disk payload output

Next improvements

Evaluate ratio-based CPU values (0–1) vs percentage
Extend optional unit handling for stronger tooling interoperability
Continue OpenMetrics alignment where needed

Repo: https://github.com/ronin1770/heka-insights-agent

Milestone 2: Standardizing Telemetry Output with JSON, Prometheus, and OpenMetrics

Farhan Munir — Thu, 16 Apr 2026 15:25:23 +0000

Milestone 2: Standardizing Telemetry Output with JSON, Prometheus, and OpenMetrics

In this milestone, we are focusing on one thing only: data format standardization.

The Heka Insights Agent already collects CPU, memory, and disk telemetry.

Now the goal is to emit the same logical metrics in three standard output formats:

JSON
Prometheus text exposition
OpenMetrics text format

Why This Milestone Matters

If an agent has no clear format strategy, every downstream integration becomes custom work.

That slows down adoption and increases maintenance cost.

By standardizing format early, we get:

stable contracts for integrations
easier validation and testing
portability across observability stacks
clearer boundaries between collection and export

Milestone Scope (Only Data Format)

This milestone does not include transports, retry logic, or backend adapters.

It only covers how telemetry is represented and serialized.

Included:

canonical internal metric model
naming/type/unit rules
serializers for json, prometheus, openmetrics
deterministic output behavior
contract tests with golden files

Out of scope:

Datadog/New Relic senders
batching/compression/persistence
new collector domains

Canonical Metric Contract

Every metric will be representable through one shared contract:

name (string)
description (string)
type (gauge or counter)
unit (e.g. bytes, seconds, percent, count)
value (number)
labels (map of string to string; empty allowed)
timestamp_unix_ms (optional integer)

This contract is the core design decision in Milestone 2.

Serializers consume this model and render format-specific output without changing metric meaning.

Naming and Semantics Rules

To keep the output stable and machine-friendly:

metric names are lowercase snake_case
all names are prefixed with heka_
counters end in _total
unit suffixes are explicit (_bytes, _seconds, _percent)
label keys are lowercase snake_case
metric identity must stay consistent across formats

Current Metric Mapping

Initial canonical mapping includes:

heka_cpu_usage_percent (gauge)
heka_cpu_time_percent (gauge with mode=<field>)
heka_memory_virtual_used_bytes (gauge)
heka_memory_virtual_available_bytes (gauge)
heka_memory_virtual_total_bytes (gauge)
heka_memory_swap_used_bytes (gauge)
heka_memory_swap_total_bytes (gauge)
heka_disk_read_bytes_total (counter)
heka_disk_write_bytes_total (counter)
heka_disk_reads_total (counter)
heka_disk_writes_total (counter)

Format-Specific Requirements

JSON

UTF-8 JSON object
includes schema_version (starting at v1)
includes generated_at (RFC3339 UTC)
includes top-level metrics array

Prometheus

Prometheus text exposition format (0.0.4)
include # HELP and # TYPE lines
deterministic label ordering
no OpenMetrics-only directives

OpenMetrics

OpenMetrics text format
include # HELP, # TYPE, and # UNIT when known
terminate payload with # EOF
metric names and labels remain aligned with Prometheus mode

Configuration Contract

One selector controls serialization:

OUTPUT_FORMAT=json|prometheus|openmetrics

default: json
invalid values: fail fast with a clear startup error

Acceptance Criteria

Milestone 2 is done when:

same logical metric set is emitted in all three formats
names/types/units are consistent
Prometheus and OpenMetrics outputs validate
JSON includes schema metadata and metrics array
output order is deterministic
golden-file tests exist for each format

GitHub Milestone Breakdown

Work is tracked through:

M2-1 canonical metric model
M2-2 collector-to-canonical mapping
M2-3 JSON serializer
M2-4 Prometheus serializer
M2-5 OpenMetrics serializer
M2-6 output format config + validation
M2-7 fixture/contract tests
M2-8 docs update

Repo

Heka Insights Agent Update: Architecture + Configuration Docs Now Reflect Runtime Behavior

Farhan Munir — Thu, 16 Apr 2026 05:37:17 +0000

Build Update (April 16, 2026)

This week I focused on documentation quality and operational clarity for heka-insights-agent.

The goal was simple: make docs match the code exactly, so contributors and operators can reason about behavior without reading every module first.

What I updated

Rewrote docs/architecture.md from scratch
Rewrote docs/configuration.md from scratch
Expanded README.md with project context, setup, and environment guidance

Architecture documentation improvements

docs/architecture.md now documents:

the actual runtime topology and control loop in src/main.py
collector boundaries and behavior (CPUCollector, MemoryCollector, DiskCollector)
logging subsystem behavior in src/logger/config.py
current payload shapes emitted by collectors
known gaps (no sender layer yet, no tests yet, no schema versioning yet)
practical extension points for next phases

This gives a real “as-implemented” architecture baseline instead of aspirational text.

Configuration documentation improvements

docs/configuration.md now includes exact behavior for the two active runtime settings:

LOG_LOCATION
CPU_POLL_INTERVAL_SECONDS

It also documents:

source/precedence rules
defaults and validation behavior
failure modes
local setup and production recommendations

Important behavior clarified

Current config loading is split across two env files:

root .env is used for LOG_LOCATION
src/.env is used for CPU_POLL_INTERVAL_SECONDS

That split is now explicitly documented to reduce startup/debug confusion.

Why this matters

For an agent project, docs are part of reliability.

Operators need to know what can fail at startup, where config is read from, and what telemetry shape to expect downstream.

This update makes onboarding and future refactors safer.

Next steps

add transport/sender layer (backend adapters)
add collector-focused tests
consolidate config loading into a single source
define schema/versioning strategy for emitted payloads

Repo: https://github.com/ronin1770/heka-insights-agent

Reel Quick - Added Docker Support

Farhan Munir — Wed, 15 Apr 2026 14:17:25 +0000

Date: 2026-04-15

Project: Reel Quick (FastAPI + Next.js + ARQ + Mongo + Redis + optional GPU workers)

Context

We containerized the stack and tried to run it in production mode with Docker Compose.

Initial startup failed for both frontend build and backend runtime.

Issues Found

Docker Compose path mismatches:
Wrong env_file paths (docker/env/*.env expected but files were in docker/).
Wrong nginx config mount path (./docker/nginx/nginx.conf while actual file was docker/nginx.conf).
Build context was incorrect for a compose file located inside docker/.
Frontend TypeScript build failure:
location field type mismatch in frontend/app/create_video/page.tsx.
Value inferred as string | undefined but state expects string | null.
Backend container crash on startup:
ModuleNotFoundError: No module named 'db'.
backend/main.py used non-package imports like from db import ....

Root Causes

Relative paths in compose were not aligned with actual file layout.
Optional API response property (file_location?) was used directly inside state update.
Backend entrypoint (uvicorn backend.main:app) requires package-safe imports (backend.*).

Fixes Applied

Docker and Compose

Updated docker/docker-compose.yml:
- build.context changed from . to ...
- env_file paths corrected to backend.env and mongo.env.
- nginx bind mount fixed to ./nginx.conf.
Updated docker/backend.env:
- Added:
- UPLOAD_FILES_LOCATION=/app/video_files
- INPUT_FILES_LOCATION=/app/video_files
Added repo-root .dockerignore (Docker uses ignore file from build context root).
Synced docker/dockerignore entries.
Updated docker/README-docker-prod.md run commands.

Frontend

Fixed type narrowing in frontend/app/create_video/page.tsx:
- Captured file_location into uploadedLocation.
- Guarded before setFiles(...).
- Used guaranteed string value in state update.

Backend

Converted backend imports to package imports in backend/main.py:
- from db import ... -> from backend.db import ...
- Similar conversion for logger, models, objects, workers.
Updated backend/objects/sound_prompt_preset.py import to backend.objects....

Commands Used for Deploy

# Stop all running containers (host-wide)
docker ps -q | xargs -r docker stop

# Start Reel Quick with GPU workers
cd /home/farhan/reel-quick/docker
docker compose --profile gpu up -d --build

# Verify
docker compose ps
docker compose logs -f api --tail=200

Validation Checklist

docker compose ps shows api, frontend, nginx, mongo, redis, workers as running.
api logs no longer show ModuleNotFoundError: No module named 'db'.
Frontend image builds successfully (npm run build passes in container build stage).
Upload endpoint works (POST /uploads).
Workers/control panel endpoints return expected data.

Key Takeaways

Keep compose file paths consistent with its directory and build context.
Use package-qualified imports for Python app modules in containerized runtimes.
Narrow optional API fields before state updates in strict TypeScript projects.
Add .dockerignore at the actual build context root to avoid bloated builds.

Build Log: Implementing Full Text Overlay Feature in Reel Quick (with Accurate Live Preview)

Farhan Munir — Fri, 10 Apr 2026 05:46:36 +0000

Build Log: Implementing Full Text Overlay Feature in Reel Quick (with Accurate Live Preview)

In this build, I implemented the complete text overlay workflow in Reel Quick: from UI controls to background processing, plus a preview system that better matches final output.

Repo: https://github.com/ronin1770/reel-quick

Why this feature mattered

The earlier flow allowed adding overlay text, but styling control was limited and preview confidence was low.

Users needed to:

control text appearance (size, color)
control placement (top/center/bottom)
preview changes instantly before processing
avoid trial-and-error renders

The goal was to make text overlays practical for real reel production, not just a placeholder UI.

What the feature now includes

1. Overlay content + timing

Users can create overlays with:

text
start time
end time

2. Style controls

Added styling inputs in the dialog:

Font size range: 40–200
Text color:
- HTML5 color picker (input type="color")
- HEX input (synced with picker)
Position selector:
- top
- center
- bottom

3. Live preview (client-side only)

The overlay updates instantly while editing:

no backend call
no queue call
no video reprocessing

This lets users iterate quickly before clicking process.

4. Dialog UX improvements

The modal was redesigned to be usable at production scale:

controls on the left
preview on the right
scrollable modal for smaller viewports
action buttons always reachable

The key technical challenge: preview/output mismatch

A big issue was that selected preview size didn’t visually match rendered output.

Root cause

Frontend preview text used raw CSS px in a display container, while backend renders text on actual output video resolution.

Same number, different render context.

Fix strategy

I changed preview scaling logic to account for real video dimensions:

Read intrinsic source dimensions from video metadata (videoWidth, videoHeight)
Measure actual preview frame dimensions in the modal
Compute scale factor:
- scale = min(previewWidth/sourceWidth, previewHeight/sourceHeight)
Render preview text as:
- previewFontSize = selectedFontSize * scale
Preserve source aspect ratio in preview container
Mirror vertical placement behavior (top/center/bottom with scaled edge padding)

Result: preview size and placement now feel much closer to final rendered video.

API + backend integration

Good news: backend already supported style fields, so no backend API redesign was needed.

The frontend sends per-overlay payload with:

style.font_size
style.text_color
position.preset

Then it follows the existing pipeline:

Save overlays POST /videos/{video_id}/text-overlays
Enqueue processing POST /enqueue/text-overlay
ARQ worker picks job and runs MoviePy text overlay composition

Data flow (end to end)

User opens text overlay dialog
Configures text + timing + style + position
Verifies in live preview
Saves overlay config
Enqueues job
Worker validates and renders final video
Processed text-overlay video becomes available for download

Validation and guardrails

Font size is clamped to configured range
HEX color is normalized/validated
Overlay timing is validated before processing
Position options are constrained to supported presets
Preview updates remain instant and local

What changed from earlier behavior

Old frontend default style used a very small fixed size
New feature provides interactive style control + accurate preview scaling
Modal UX is now horizontal and production-friendly

What’s next

Potential follow-ups:

font family selection
stroke/shadow controls in UI
drag-and-drop custom placement
multiple overlay tracks/timeline editing

If you want, I can also generate:

a shorter Dev.to teaser version
an accompanying screenshot checklist for the post
a “before vs after” section with technical diff notes

DEV Community: Farhan Munir

Heka-Insights-Agent: Milestone 6 Complete: Datadog OTLP + Native Integration Paths

Milestone 6 Goal

What We Implemented

1. Datadog OTLP preset mode (datadog_otlp)

2. Datadog native exporter (datadog_native)

3. Validation hardening (M6-3)

4. Deterministic mapping rules (M6-4)

5. OTLP vs native comparison docs (M6-5)

6. Docker-based milestone-6 integration tests (M6-6)

Key Files Added/Updated

Implementation Strategy

Test Commands and Outputs

Local Datadog-focused test suite

Full local suite

Docker live Datadog integration tests (milestone-6)

Why This Matters

Contribute

Milestone 5 Complete: New Relic OTLP Integration for Heka Insights Agent

Milestone 5 Goal

What We Changed

1. New Relic preset mode (EXPORTER_TYPE=newrelic_otlp)

2. Automatic auth header injection

3. Precedence behavior

4. Validation hardening

5. Documentation updates

Commands to Run Tests

Unit tests for config behavior

New Relic Docker integration tests (explicit)

Full OTLP/HTTP Docker test stack

Why This Matters

Milestone 4 Complete — OTLP HTTP Exporter for Heka Insights Agent

What was implemented in Milestone 4

Configuration added

Notes

Unit tests

Unit test output

Integration tests (Docker + real collector)

Integration test output

Collector config used in tests

Final result

Milestone 4 (Part 1): Implementing OTLP HTTP Core in Heka Insights Agent (M4-1, M4-2)

Milestone 4 (Part 1): Implementing OTLP HTTP Core in Heka Insights Agent (M4-1, M4-2)

Why This Split Matters

What Was Implemented

M4-1: OTLP Payload Mapping Layer

M4-2: OTLP HTTP Sender + Exporter

Local Test Setup with OpenTelemetry Collector (Docker)

Collector config (otel-collector-config.yaml)

Run collector

Agent .env for this test

Run agent

Verification Signals

Tests Added

What Is Intentionally Not Included Yet

Closing

Heka-Insights-Agent: Milestone M3-4: Fail-Fast Exporter Validation at Startup

Milestone M3-4: Fail-Fast Exporter Validation at Startup

Context

The Problem Before M3-4

M3-4 Goals

Implementation Summary

1) Strict validation in runtime config

2) Removed fallback from exporter factory

3) Documentation updated to M3-4 semantics

Validation Performed

Compile validation

Behavior validation

Why this change matters operationally

Architectural Impact

What’s Next

Milestone M3-3: Refactoring Console Output Into an Exporter Pathway

Milestone M3-3: Refactoring Console Output Into an Exporter Pathway

Why M3-3 mattered

What the code looked like before

Design goals for this refactor

What was implemented

1) Console exporter implementation

2) Exporter factory wiring

3) Canonical metric normalization pipeline

1. Datadog OTLP preset mode (`datadog_otlp`)

2. Datadog native exporter (`datadog_native`)

1. New Relic preset mode (`EXPORTER_TYPE=newrelic_otlp`)

Collector config (`otel-collector-config.yaml`)

Agent `.env` for this test