DEV Community

Cover image for Cutting CloudWatch Logs ingest costs with an OSS Rust gateway
abyo software LLC
abyo software LLC

Posted on

Cutting CloudWatch Logs ingest costs with an OSS Rust gateway

If you've ever stared at a CloudWatch Logs bill for an app that writes a lot of events nobody reads, you've probably noticed that for write-heavy, rarely-read workloads ingest dominates storage. AWS bills CloudWatch Logs ingest at $0.50/GB of raw bytes plus a 26 B/event metered overhead (us-east-1 list pricing, 2026-06), and in those workloads the expensive path is writing logs that almost no one queries afterwards.

S4 Logs is an Apache-2.0 Rust workspace for that pattern. It has two independent modes. This post walks through what each one does, the code surface needed to deploy them, and the measured numbers from a real AWS validation run.

Mode A — drain existing log groups to S3

s4logs drain pulls events from CloudWatch Logs via FilterLogEvents and writes them to S3 as standard RFC 8878 zstd frames containing one JSONL event per line. Work is bucketed into UTC-grid-aligned windows (one hour by default, configurable via --window), each window produces one manifest, and re-runs skip windows that already have a manifest. That makes drains idempotent.

# Read-only plan: project per-group savings across the account
s4logs plan --all

# Real drain to S3 Glacier Instant Retrieval (use --dry-run to preview without writing)
s4logs drain --log-group /aws/lambda/payments \
  --bucket my-archive-bucket --prefix s4logs --account 123456789012 \
  --storage-class glacier-ir
Enter fullscreen mode Exit fullscreen mode

Mode A only reclaims the storage line on what's already been ingested — the ingest you've already paid is sunk. Until you shorten retention, Mode A adds an S3 archive copy on top of the existing CloudWatch storage; the net storage saving appears only after the fail-closed retention shrink. Once every window older than your proposed cutoff has a verified manifest, you can shorten the log group's retention. That's a separate step (--apply-retention), report-only by default.

Mode B — bypass ingest with a CloudWatch Logs API-subset gateway

s4logs serve speaks a subset of the CloudWatch Logs AWS JSON 1.1 API:

PutLogEvents
CreateLogGroup
CreateLogStream
DescribeLogGroups
DescribeLogStreams
Enter fullscreen mode Exit fullscreen mode

Point Fluent Bit, the CloudWatch Agent, or any SDK that stays within that subset at the gateway by overriding the endpoint:

# fluent-bit.conf
[OUTPUT]
    Name              cloudwatch_logs
    Match             *
    Region            us-east-1
    log_group_name    /app/api
    log_stream_prefix node-
    auto_create_group On
    Endpoint          s4logs-gateway.internal
    Port              8080
Enter fullscreen mode Exit fullscreen mode

(Fluent Bit's cloudwatch_logs plugin documents a custom endpoint; the host + port split is the form shown in its LocalStack docs. Verify against your exact Fluent Bit image — the plugin behavior has shifted across versions.)

For traffic routed s3 (not both, which keeps the CloudWatch copy), you go from CloudWatch ingest charges to $0 on that traffic — you pay S3 PUT and S3 storage instead. No application payload transform is required on the ingest path; clients still speak the CloudWatch Logs API subset, with an endpoint override (and SigV4 mode needs matching static credentials).

Routing is a first-match TOML config so you can keep your alert-driven log groups on real CloudWatch and bypass the rest:

# routing.toml
default_action = "s3"          # s3 | cloudwatch | both | drop

[[rule]]
log_group = "/aws/lambda/payments-*"
action    = "cloudwatch"        # keep alerting paths intact

[[rule]]
log_group = "/aws/lambda/batch-*"
action    = "s3"                # archive only
Enter fullscreen mode Exit fullscreen mode

Crash durability is opt-in. With --wal-dir, events are fsynced before ack and replayed on restart (at-least-once; duplicates possible after a crash, but no silent loss of acknowledged events on a normal ext4/xfs disk; FUSE/NFS mounts that reject directory fsync degrade to best-effort and are surfaced in a metric). Without it, anything below the flush thresholds can be lost on a hard kill.

Request auth defaults to off — the gateway accepts unsigned requests to the supported API subset and trusts the caller, so deploy it behind TLS + a network boundary regardless. If you want request signing on top, --auth-mode sigv4 verifies against a single static key pair (not IAM-backed). Use S4LOGS_AUTH_SECRET rather than the --auth-secret flag because process lists leak flag values; S4LOGS_AUTH_ACCESS_KEY is also supported.

v1.1.1 additions

  • ACME (Let's Encrypt) CLI surface for the gateway. Uses TLS-ALPN-01, so cert acquisition and renewal share the same bound TLS socket (typically :443). Requires --acme-domain, --acme-contact, and --acme-cache-dir; mutually exclusive with --tls-cert/--tls-key. Use --acme-staging to verify against Let's Encrypt's staging environment before production. The cache directory is mandatory because re-registering on every restart trips Let's Encrypt rate limits. Because this uses TLS-ALPN-01, deploy it where the ACME validator can reach the TLS listener (typically public TCP/443), and make sure the cache directory is writable by the s4logs process.
  • s4-emf::EmfDocument::samples_bounded() — a pre-materialization expansion guard for untrusted EMF input. EMF flattens (directive × metric × dimension_set), so a ~100 KB document with a 10k-element metric value array and 10k dimension sets expands to ~800 MB of f64 values. samples_bounded(max_samples, max_values) computes the expansion size and rejects over-cap inputs with EmfError::ExpansionTooLarge before allocating the flattened Vec<EmfSample> or cloned f64 value/count arrays. Use it after an upstream body-size cap on any boundary that accepts untrusted EMF — it bounds f64 expansion, not total document bytes, so the upstream cap is the first line of defence.

No proprietary archive format

The contract is in DESIGN.md §14: the persisted on-disk format is frozen for the 1.x line. Data objects are plain standard zstd over JSONL — no S4-only container. The archive remains readable without the S4 Logs binary:

aws s3 cp s3://bucket/path/file.jsonl.zst - | zstd -dc | jq .
Enter fullscreen mode Exit fullscreen mode

Sidecars (.s4index, .s4lts) and manifest JSON are S4-specific but documented. The repository ships an Athena DDL recipe in docs/athena.md: the explicit ADD PARTITION form is what's verified end-to-end; a partition-projection variant over account/loggroup/dt is documented but only parse-validated so far.

Validation against real AWS

Numbers measured against a real us-east-1 account with seeded synthetic data. We didn't run this against production-organic logs and we label it as such.

Mode A (2026-06-10): 33,163,647 events / 5.00 GiB of message bytes seeded across 16 streams. Drain over 5 windows at --concurrency 4 completed in 94.6 min with 0 ThrottlingException. Drain output and Athena full count agreed at 33,163,613 events; the 34-event difference vs the seeder count was not observed in CloudWatch reads either (i.e. CloudWatch never returned them via FilterLogEvents) and remains unattributed. JSONL output 9.7 GiB → archive 1.6 GiB zstd (6.2×), 41 objects.

Mode B + restore (2026-06-12, KB-scale): the live validation path used the AWS CLI / SDK endpoint override. Fluent Bit and CloudWatch Agent should work within the same five-action subset, but their exact plugin/version combinations were not part of this real-AWS run. With the AWS CLI: gateway-routed PutLogEvents landed at the correct S3 layout (dt=...); both reached real CloudWatch and S3; s3-only created no real CloudWatch log group; restore --to-log-group re-ingested at current time with the original timestamp wrapped, because CW rejects timestamps older than 14 days or older than the group retention.

LocalStack soak (separate, 2 hours): 715,817 requests / 7,158,170 events acked and durable / 0 loss / 0 failures / RSS +2.3 MiB.

Mode A real-AWS experiment cost ~$2.60 by list-price math (Athena and S3 pennies observed; the CloudWatch ingest line had not yet posted before teardown). Mode B validation was at KB scale.

Limitations worth knowing

  • The gateway is a five-action CloudWatch Logs compatibility subset, not a full CloudWatch Logs emulator. Unsupported actions are out of scope.
  • Single AWS account per deployment; multi-account / Organizations support is not implemented here.
  • --shard-streams is the parallelism knob for groups with many streams. It defaults to 1 (unsharded). When you set it >1, the partitioner respects AWS's 100-stream filter limit by auto-growing the shard count if needed. We have not benchmarked the path at thousands-of-streams scale yet.
  • Late-arriving / backdated events can be missed if you drain a window before they index. The default --to cutoff already builds in a 15-minute ingestion-lag buffer (measured visibility lag for backdated events was 3–5.5 min in the real-AWS run), but you should still drain mature windows for agent-delivered or backdated stragglers, or pass --reconcile to re-page a manifested range and append only what's new (dedup by event identity). --reconcile re-pages the full window, so it costs the same CloudWatch time as a fresh drain of that range — it's a repair tool, not a cheap verification sweep.
  • Glacier Instant Retrieval is millisecond-access but has the usual S3 GIR catches: 90-day minimum storage duration, per-GB retrieval charges, and 128 KiB minimum billable object size. --storage-class glacier-ir applies only to data objects; sidecars and manifests stay on S3 Standard.
  • samples_bounded() bounds the f64 expansion, not total document bytes or per-sample string clones (dimension key/value, namespace, metric name, unit). Pair it with an upstream request-body size cap.
  • Default request auth is off (as noted above). SigV4 mode is single static access key/secret only: no IAM policy evaluation, no STS / session tokens, no presigned URLs. /health, /ready, and /metrics remain unauthenticated regardless, so the network boundary stays load-bearing.
  • The gateway applies backpressure at --max-buffered-bytes (default 256MiB); clients receive 503 once the buffer fills, and are expected to retry.
  • Drain output is byte-deterministic across re-runs only at --shard-streams 1. Shards >1 trade byte-determinism for speed; the record set is the same but the order inside data objects isn't.

Installing

Linux x86_64 / aarch64 static musl binaries from GitHub Releases, or:

cargo install --git https://github.com/abyo-software/s4-logs s4logs-cli
Enter fullscreen mode Exit fullscreen mode

Docker: docker build -t s4logs . (~176 MB runtime).

Repo, README, DESIGN.md, and the full validation tables at github.com/abyo-software/s4-logs. Apache-2.0.

Top comments (0)