Arya Gorjipour

Posted on Jun 7 • Originally published at github.com

logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)

#opensource #devops #rust #showdev

v0.2.0 was a good release. I was happy with it. I used it.

Then I kept trying to write (level=error OR level=warn) AND service=payments and the tool just... didn't know what parens were. Three separate times. Same query. Same sigh. Same manual rewrite to flatten it.

I shipped logdive to scratch my own itch. My itch wasn't done itching.

v0.3.0 is five things my heart kept asking for while using v0.2. Parenthesised queries are the headline — I literally promised that in the v0.2 article. But there's also pagination in both the CLI and the API, case-insensitive level queries (because level=ERROR and level=error should absolutely be the same thing), a distrreleases/tag/v0.3.0oless Docker image that doesn't need curl to healthcheck itself, and a website that now exists.

# The thing I kept trying to write
logdive query '(level=error OR level=warn) AND service=payments'

# Page through results instead of drowning in them
logdive query 'service=payments' --limit 50 --offset 100

# Or through the API — same thing, different surface
curl 'http://localhost:4000/query?q=service%3Dpayments&limit=50&offset=100'

# These are finally identical. They were not.
logdive query 'level=ERROR'
logdive query 'level=error'

# Smaller container. No curl. Healthchecks itself.
docker pull ghcr.io/aryagorjipour/logdive:0.3.0

cargo install logdive logdive-api --force

417 tests passing. Five milestones. Binaries still 3.9 MB and 4.2 MB.

M1 — Parenthesised queries

This was the one. The v0.2 article literally put it at the top of the contributions list and called it "the v0.3 flagship." Accountability shipped.

The v0.2 grammar had two levels: OR > AND. AND binds tighter. Good enough until you need OR of multiple conditions grouped with AND of something else. Then you're doing De Morgan algebra on your query and that's not what you want at midnight.

query     := or_expr [ TIME_RANGE ]
or_expr   := and_expr (OR and_expr)*
and_expr  := clause (AND clause)*
clause    := field OP value
           | field CONTAINS string
           | "(" or_expr ")"    ← new
           | TIME_RANGE

(level=error OR level=warn) AND service=payments generates:

WHERE ((lower(level) = ? OR lower(level) = ?) AND json_extract(fields, '$.service') = ?)

The inner group gets its own SQL sub-expression. You can't construct something that silently breaks precedence — the generator always parenthesises each nesting level.

The new AST variant is Clause::Group(Box<QueryNode>). The Box is there because Rust won't let you have a recursively-sized type without it, which is a very Rust thing to be strict about. An arena would be cleaner if query parsing were a hot path. It isn't, so — Box.

M2 — `--offset` and the rename I had to make

The additive part: --offset is now a real flag.

logdive query 'service=payments' --limit 50
logdive query 'service=payments' --limit 50 --offset 50
logdive query 'service=payments' --limit 50 --offset 100

--offset 0 and no flag at all are the same thing. Default limit is still 1000. --limit 0 still means "all of them, good luck."

Adding offset meant execute(query, conn, Option<usize>) had to become execute(query, conn, QueryOptions). A bare Option<usize> for one parameter is fine. For two it starts getting philosophical. The struct should have been there from v0.1. It's there now.

The breaking part: --format on the query subcommand is now --output.

# v0.2
logdive query 'level=error' --format json

# v0.3
logdive query 'level=error' --output json

--format already existed on ingest to pick the input log format (JSON, logfmt, plain). Two different --format flags doing two different things on two different subcommands is a documentation problem that keeps getting worse. One word to fix it. I did not add a deprecated alias — a deprecated alias that silently works is just confusion that lives for three more versions.

M3 — HTTP pagination

GET /query now takes ?offset=. Mirrors --offset exactly.

curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=0'
curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=50'

The benchmark number I didn't expect: page 1 at 100k rows costs ~42 ms, deep page at offset 2450 costs ~50 ms. 8 ms overhead to skip 2450 rows. That's because LIMIT x OFFSET y in SQLite counts forward from zero — no scroll cursor, no magic. For building a UI on top of the API it's fine. For "give me rows 500,000 through 500,050" — use a time range query, it'll be faster and make more sense anyway.

M4 — `level=ERROR` and `level=error` are the same query now

This one seems like it should have been there from the start. It wasn't. If your service logs WARN and you searched level=warn, you got nothing and probably thought the tool was broken.

# All three hit the same index, return the same rows
logdive query 'level=ERROR'
logdive query 'level=warn'
logdive query 'level=Warning'

Implementation: a functional expression index.

CREATE INDEX IF NOT EXISTS idx_level_norm ON log_entries(lower(level));

The executor routes every level field lookup through lower(level) = ? with a Rust-lowercased bind value.

The wrong path I went down first: ALTER TABLE ADD COLUMN level_norm TEXT GENERATED ALWAYS AS (lower(level)) STORED. This works in SQLite! But it means existing databases need a migration and new installs use CREATE TABLE. You need a version guard to tell them apart. The functional index approach needs none of that — CREATE INDEX IF NOT EXISTS is idempotent, runs on every Indexer::open(), picks up existing databases automatically.

It's in docs/traps.md now. That file is starting to earn its name.

The benchmark result: lowercase, uppercase, mixed-case level queries on 100k rows — all three at ~51 ms. Identical. The index is doing exactly what it's supposed to do.

M5 — Distroless Docker and `--health-check`

The v0.2 Dockerfile had this healthcheck:

HEALTHCHECK CMD curl -fs http://localhost:4000/version || exit 1

curl is 3.6 MB. It's in the image to make one TCP connection every 30 seconds. I finally stopped accepting this.

Runtime stage is now gcr.io/distroless/cc-debian12:nonroot. No shell, no package manager, no curl, uid 65532. Container drops to ~15 MB.

Since distroless has no shell, CMD curl ... in the Dockerfile gets rejected at build time. Good. That rejection forced me to do the right thing.

logdive-api now takes --health-check:

logdive-api --health-check
# opens TcpStream::connect("127.0.0.1:<port>")
# exits 0 if the server is up, exits 1 if it isn't

HEALTHCHECK CMD ["/usr/local/bin/logdive-api", "--health-check"]

The binary checks itself. No curl. No shell.

One trap: you can't do RUN mkdir -p /data in a distroless runtime stage. No shell to interpret it. You have to create the directory in the builder stage and copy it across:

# Builder
RUN mkdir -p /data

# Runtime
COPY --from=builder /data /data

The error message when you forget this is not particularly helpful. Now it's here instead.

Breaking changes

Three things that will break something:

Scope	Old	New	Fix
CLI	`query --format json`	`query --output json`	one word in your scripts
`logdive-core` lib	`execute(q, conn, Option<usize>)`	`execute(q, conn, QueryOptions)`	see below
Docker	`curl GET /version` healthcheck	`--health-check` TCP flag	update compose / k8s probes

Library migration:

// v0.2
execute(&query, &conn, Some(1000))?;
execute(&query, &conn, None)?;

// v0.3
use logdive_core::executor::QueryOptions;
execute(&query, &conn, QueryOptions { limit: Some(1000), offset: None })?;
execute(&query, &conn, QueryOptions::default())?;  // no limit, no offset

QueryOptions::default() is zero-offset, no-limit. If you were passing None before, it's a drop-in.

Benchmarks

New groups for v0.3 features (100k rows):

What	Scenario	Number
OR queries	2-branch, 50% match	68 ms
OR queries	4-branch, 100% match	99 ms
OR queries	JSON field	2.5 ms
Paren groups	`(A OR B) AND C`, 12.5% match	45 ms
Case-insensitive level	lowercase / UPPERCASE / Mixed	~51 ms, identical
Pagination	page 1	42 ms
Pagination	deep page (offset 2450)	50 ms

Ingest numbers haven't changed (v0.2 ingest paths weren't touched):

What	Number
Batched insert, 10k rows	~189k rows/s
Parse + insert, 10k rows	~150k rows/s

The 4-branch OR at 99 ms looks alarming until you remember it's returning every single row from a 100k-row corpus. The bottleneck is serialisation. The query engine itself is fine.

Tradeoffs, I'll be honest

--format → --output will break your scripts. No alias. One word, then it's clean. If you want to be annoyed at me about it — fair, but this was the right call.

Distroless means no shell in the container. You can't docker exec -it mycontainer bash anymore. Use gcr.io/distroless/cc-debian12:debug if you need to poke around. The tradeoff is worth it, but it's a real operational change worth knowing about before you're in an incident.

Deep offset pagination has a cost. SQLite walks from row zero every time. For page 50 it's fine. For page 50,000, consider a time-range query instead — it'll be faster and is usually what you actually wanted anyway.

The full landing page redesign is still pending. The site has v0.3.0 content — accurate numbers, updated terminal preview. The full Astro 5 + Tailwind 4 redesign is waiting on a design file. Not v0.3's problem.

The website is live

Speaking of which: aryagorjipour.github.io/logdive is updated and real.

Stat cards reflect the current benchmarks. Terminal preview shows --output json and a paren query. The roadmap section is accurate. There's a GitHub stars counter that does a client-side fetch and falls back to — gracefully if the API is having a day.

Go look at it. Tell me what's wrong with it.

What's next — and I'm taking a break after this

v0.4.0 planned scope:

Benchmark suite at 500k rows (100k isn't stressful enough for the executor's real hot paths)
Query latency improvements
--output yaml and --output csv
Windows --follow — the (dev, ino) rotation check has been Unix-only since v0.2
Configurable retention by source/tag

But honestly: after this release I'm stepping back from logdive for a bit to work on some other projects. v0.3.0 is in a clean state. prerelease-check.sh passes all 11 steps. 417 tests green. The breaking changes are documented.

Good place to breathe.

If something genuinely breaks (security issue, data loss) — file an issue, I'll look. Everything else waits for v0.4.0.

Repo: github.com/Aryagorjipour/logdive
Website: aryagorjipour.github.io/logdive
Crates: logdive · logdive-core · logdive-api
Docker: ghcr.io/aryagorjipour/logdive

Arya Gorjipour — backend engineer, logdive maintainer.
@Aryagorjipour · @Arysmart1

If you run cargo bench on your machine and the numbers are interesting — I want to see them. If you debug a real incident with v0.3.0 — I really want to hear about that.

DEV Community

logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)

M1 — Parenthesised queries

M2 — `--offset` and the rename I had to make

M3 — HTTP pagination

M4 — `level=ERROR` and `level=error` are the same query now

M5 — Distroless Docker and `--health-check`

Breaking changes

Benchmarks

Tradeoffs, I'll be honest

The website is live

What's next — and I'm taking a break after this

Top comments (0)

M1 — Parenthesised queries

M2 — --offset and the rename I had to make

M3 — HTTP pagination

M4 — level=ERROR and level=error are the same query now

M5 — Distroless Docker and --health-check

Breaking changes

Benchmarks

Tradeoffs, I'll be honest

The website is live

What's next — and I'm taking a break after this

M2 — `--offset` and the rename I had to make

M4 — `level=ERROR` and `level=error` are the same query now

M5 — Distroless Docker and `--health-check`