Arya Gorjipour

Posted on Apr 28 • Originally published at github.com

I wanted jq with memory, time ranges, and filters. So I built logdive

#rust #opensource #showdev #devops

Your app is in production. Something broke at 2am. Your options are:

grep through a rotated log file, squinting at terminal output.
Chain together half a dozen jq pipes until the command line becomes unreadable.
Page an SRE to query your observability stack, assuming you have one.
Spin up Loki or Elastic locally, spend two hours on config, and then do the actual investigation.

All four of these suck. Either you're limited to flat text tooling, or you're paying for infrastructure complexity you don't need.

logdive is what sits in the gap. It's a single Rust binary. You drop it anywhere, point it at a log file or pipe Docker output into it, and you get a fast, queryable index on your local machine. No daemon. No cloud. No YAML. Just cargo install logdive.

# Ingest logs from a file or pipe from stdin.
logdive ingest --file ./logs/app.log
docker logs my-container | logdive ingest --tag my-container

# Query the index.
logdive query 'level=error AND service=payments last 2h'
logdive query 'message contains "timeout"' --format json | jq

# Inspect what you've indexed.
logdive stats

# Optionally expose a read-only HTTP API for remote querying.
logdive-api --db ./logdive.db --port 4000
curl 'http://127.0.0.1:4000/query?q=level%3Derror&limit=100'

That's the whole product surface.

Why logdive exists

Every backend engineer has hit the wall this is built for: your application emits perfectly good structured JSON logs. Tools for querying that JSON locally are stuck in the extremes:

jq is for a single file, one-shot, no memory, no time ranges, no filters-across-files.
Loki, Datadog, Elastic, Splunk all demand infrastructure, cost, and configuration that's overkill for a side project, small team, or personal investigation.

The target user is a backend engineer who wants jq with memory, filters, and time ranges — without YAML files, without a running daemon they didn't ask for, without a monthly bill.

Rust makes this credible in a way no other language quite does: a single self-contained binary with no runtime, zero-copy parsing, SQLite bundled directly into the binary, and real concurrency for ingestion. This is the kind of tool Rust is genuinely good at.

Who this is for (and who it isn't)

Good fit:

Backend engineers debugging production incidents from local log copies.
Small teams without a dedicated observability budget.
Anyone who's ever built a 4-stage jq pipeline and wished it was searchable afterward.
Folks running Docker locally who want docker logs my-container | logdive ingest and instant querying.
CI pipelines that need to grep through structured output of a previous step.

Bad fit (and I'll be explicit about this):

Multi-machine, networked indexes. logdive is single-host by design.
Real-time log tailing / tail -f style follow mode. Not in v0.1.0.
Anything needing authentication on the HTTP endpoint. The v1 API assumes the network layer handles access control.
Massive enterprise-scale log volumes. SQLite handles a lot, but if you're indexing 100GB/day, you want Loki.
Non-JSON log formats (plaintext, logfmt, syslog). v1 is JSON-only.

The whole scope is deliberately small. v0.1.0 ships what a side project or small team needs, nothing more.

The query language

Small enough to fit in your head, expressive enough to be useful:

level=error
level=error AND service=payments
message contains "database timeout"
level=error last 2h
tag=api AND status > 499 since 2026-04-15
user_id=4812 AND duration_ms > 500

Operators: =, !=, >, <, CONTAINS. Time ranges: last Nm/Nh/Nd or since <datetime>. Clauses chain with AND. Known fields (timestamp, level, message, tag) hit SQLite indexes directly. Unknown fields go through json_extract() on a JSON blob — slower but works for arbitrary JSON shapes.

No OR in v0.1.0 — it's the single biggest v1 non-goal. AND covers the dominant query pattern, and adding OR requires a two-level grammar plus precedence handling that would roughly double the parser. Deferred to v2 deliberately.

Under the hood

For readers who like the implementation details:

Three-crate Cargo workspace. logdive-core is pure library (parser, indexer, query engine — no I/O at module level), logdive is the CLI binary, logdive-api is the HTTP server binary. Each is independently publishable.
SQLite via rusqlite with the bundled feature. Zero infrastructure, battle-tested, ships inside the binary at ~500KB.
Hybrid storage. Known fields (timestamp, level, message, tag) are real indexed columns. Everything else is stored in a JSON blob queryable via SQLite's json_extract(). This is the only way to handle arbitrary JSON shapes without a schema-bound design.
Hand-written recursive descent query parser. ~200 lines of Rust enums. No parser-combinator dependency. Better error messages than generated parsers, and honestly, it was one of the most satisfying parts of the project to write.
Blake3 row hashing for deduplication. INSERT OR IGNORE on a unique hash column means re-ingesting a file (or dealing with log rotation) is free. No duplicate rows. The hash is cheap — negligible per-line cost.
Batched inserts at 1000 rows per transaction. Standard SQLite throughput pattern.
Axum HTTP API. Read-only via SQLITE_OPEN_READ_ONLY, blocking SQLite work wrapped in tokio::task::spawn_blocking so it doesn't block Tokio's worker threads, graceful shutdown on Ctrl-C and SIGTERM.

The full architecture is documented in the repo's README.

Performance

Representative numbers on an Acer Nitro 5 laptop, measured via criterion:

Operation	Throughput / Latency
Ingestion, batched insert (10k rows)	~210k lines/sec
Ingestion, parse + insert end-to-end (10k rows)	~166k lines/sec
Query on known field, empty result (100k rows)	~17 μs
Query on known field, 25% match (100k rows, LIMIT 1000)	~39 ms
Query on JSON field, 25% match (100k rows, LIMIT 1000)	~3.6 ms
Query on JSON field, 0% match (full scan, 100k rows)	~68 ms
`CONTAINS` full-table scan (100k rows)	~36–40 ms
3-clause `AND` chain (100k rows)	~22 ms

Release binaries at 3.7 MB (logdive) and 4.1 MB (logdive-api) — well under the 10 MB target, thanks to LTO + strip + panic=abort in the release profile.

Run cargo bench in the repo to get your own baseline.

Tradeoffs worth being honest about

A few design decisions have real downsides that users should know:

Timestamps compared as lexical TEXT. This is correct for ISO-8601-shaped timestamps (they sort chronologically when compared as strings), but any exotic timestamp format will silently misorder. Default timestamps from modern structured loggers are ISO-8601, so in practice this is rarely a problem — but it's a real sharp edge.

No index on json_extract expressions. Queries on unknown JSON fields fall back to full table scans. 100k rows scans in ~68ms which is still fast, but if you're hammering the same unknown field constantly, it's slower than a known-column query by 1000x. A future version could promote frequently-queried JSON fields to real columns.

Single-host only. There's no clustering story. If you need distributed query across machines, you want Loki or Elastic.

No authentication on the HTTP API. Deliberate for v1. If you expose logdive-api beyond localhost, put a reverse proxy with auth in front of it. The binary defaults to binding 127.0.0.1 for a reason.

Install and try it

From crates.io:

cargo install logdive logdive-api

From prebuilt binaries: grab the tarball for your platform from the GitHub Releases page. Linux x86_64 and macOS arm64 are built on every tag push.

From source:

git clone https://github.com/Aryagorjipour/logdive
cd logdive
cargo build --release

MSRV: Rust 1.85 (edition 2024). Dual-licensed MIT OR Apache-2.0.

Try the included examples:

logdive --db /tmp/demo.db ingest --file examples/app.log
logdive --db /tmp/demo.db ingest --file examples/nginx.log
logdive --db /tmp/demo.db stats
logdive --db /tmp/demo.db query 'level=error AND service=payments'

Call for contributions

v0.1.0 is deliberately small, but there's a clear set of high-value v2 features that would benefit hugely from community help. If you want to contribute, these are genuine needs:

OR operator in the query language. The single most-requested feature implied by v1's scope. Extends the parser to handle a two-level grammar (clauses joined by OR, OR-groups joined by AND) and the SQL generator to emit parenthesized disjunctions. Non-trivial but well-scoped.

Non-JSON log format support. Plaintext and logfmt are the obvious next formats. Would plug in as additional parser implementations alongside parse_line in logdive-core.

Follow mode (-f / tail-and-index). Watch a log file for new lines and ingest them as they appear. Good use of tokio::fs and notify crate patterns.

A browser UI. The HTTP API is ready for one — someone with frontend chops could build a single-page React/Svelte/HTMX UI that talks to logdive-api and gives people a browser-based query interface.

Generated columns for frequently-queried JSON fields. The big performance win. Would let users mark certain JSON fields as "promote to indexed column" and get known-field query performance for those.

Benchmarks on more hardware. If you run the existing cargo bench suite on your machine, an issue/PR updating the README's Performance section with a broader sample would be genuinely useful.

Docker image for the HTTP API. Dockerfile for logdive-api with a volume mount for the index database. Natural next step for users who want to run the API as a service.

The repo has CI, benchmarks, clean test coverage, and a documented contribution workflow. Issues and pull requests at github.com/Aryagorjipour/logdive.

A note on context

logdive started as the final project in a Rust learning journey. The framing mattered — I wanted a project that was small enough to finish, demanding enough to exercise real Rust (parsers, SQLite, async, concurrency, CLI, HTTP), and useful enough that I'd actually keep using it afterward.

What I underestimated: how much of the effort lives in the parts that aren't writing code. Setting up a clean workspace. Choosing the right abstractions between core and binaries. Writing benchmarks that actually measure what you think they measure. Testing an HTTP server with tower::ServiceExt::oneshot. Packaging a three-crate workspace for crates.io when one crate depends on another and you're publishing for the first time. Each of these had at least one subtle gotcha.

The project is open source because the next person hitting the "jq vs Datadog" wall might as well benefit from it, and because Rust has given me enough that I want to give something back.

About

Arya Gorjipour — backend engineer, Rust learner, logdive maintainer.

GitHub: @Aryagorjipour
Twitter/X: @Arysmart1
LinkedIn: arysmart

Issues, bug reports, and pull requests welcome. If you end up using logdive to debug a real production incident, I'd love to hear about it.

DEV Community