DEV Community

Cover image for logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)
Arya Gorjipour
Arya Gorjipour

Posted on • Originally published at github.com

logdive v0.3.0 — the one where I finally added parens (and four more things my heart wanted)

v0.2.0 was a good release. I was happy with it. I used it.

Then I kept trying to write (level=error OR level=warn) AND service=payments and the tool just... didn't know what parens were. Three separate times. Same query. Same sigh. Same manual rewrite to flatten it.

I shipped logdive to scratch my own itch. My itch wasn't done itching.

v0.3.0 is five things my heart kept asking for while using v0.2. Parenthesised queries are the headline — I literally promised that in the v0.2 article. But there's also pagination in both the CLI and the API, case-insensitive level queries (because level=ERROR and level=error should absolutely be the same thing), a distrreleases/tag/v0.3.0oless Docker image that doesn't need curl to healthcheck itself, and a website that now exists.

# The thing I kept trying to write
logdive query '(level=error OR level=warn) AND service=payments'

# Page through results instead of drowning in them
logdive query 'service=payments' --limit 50 --offset 100

# Or through the API — same thing, different surface
curl 'http://localhost:4000/query?q=service%3Dpayments&limit=50&offset=100'

# These are finally identical. They were not.
logdive query 'level=ERROR'
logdive query 'level=error'

# Smaller container. No curl. Healthchecks itself.
docker pull ghcr.io/aryagorjipour/logdive:0.3.0
Enter fullscreen mode Exit fullscreen mode
cargo install logdive logdive-api --force
Enter fullscreen mode Exit fullscreen mode

417 tests passing. Five milestones. Binaries still 3.9 MB and 4.2 MB.


M1 — Parenthesised queries

This was the one. The v0.2 article literally put it at the top of the contributions list and called it "the v0.3 flagship." Accountability shipped.

The v0.2 grammar had two levels: OR > AND. AND binds tighter. Good enough until you need OR of multiple conditions grouped with AND of something else. Then you're doing De Morgan algebra on your query and that's not what you want at midnight.

query     := or_expr [ TIME_RANGE ]
or_expr   := and_expr (OR and_expr)*
and_expr  := clause (AND clause)*
clause    := field OP value
           | field CONTAINS string
           | "(" or_expr ")"    ← new
           | TIME_RANGE
Enter fullscreen mode Exit fullscreen mode

(level=error OR level=warn) AND service=payments generates:

WHERE ((lower(level) = ? OR lower(level) = ?) AND json_extract(fields, '$.service') = ?)
Enter fullscreen mode Exit fullscreen mode

The inner group gets its own SQL sub-expression. You can't construct something that silently breaks precedence — the generator always parenthesises each nesting level.

The new AST variant is Clause::Group(Box<QueryNode>). The Box is there because Rust won't let you have a recursively-sized type without it, which is a very Rust thing to be strict about. An arena would be cleaner if query parsing were a hot path. It isn't, so — Box.


M2 — --offset and the rename I had to make

The additive part: --offset is now a real flag.

logdive query 'service=payments' --limit 50
logdive query 'service=payments' --limit 50 --offset 50
logdive query 'service=payments' --limit 50 --offset 100
Enter fullscreen mode Exit fullscreen mode

--offset 0 and no flag at all are the same thing. Default limit is still 1000. --limit 0 still means "all of them, good luck."

Adding offset meant execute(query, conn, Option<usize>) had to become execute(query, conn, QueryOptions). A bare Option<usize> for one parameter is fine. For two it starts getting philosophical. The struct should have been there from v0.1. It's there now.

The breaking part: --format on the query subcommand is now --output.

# v0.2
logdive query 'level=error' --format json

# v0.3
logdive query 'level=error' --output json
Enter fullscreen mode Exit fullscreen mode

--format already existed on ingest to pick the input log format (JSON, logfmt, plain). Two different --format flags doing two different things on two different subcommands is a documentation problem that keeps getting worse. One word to fix it. I did not add a deprecated alias — a deprecated alias that silently works is just confusion that lives for three more versions.


M3 — HTTP pagination

GET /query now takes ?offset=. Mirrors --offset exactly.

curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=0'
curl 'http://localhost:4000/query?q=level%3Derror&limit=50&offset=50'
Enter fullscreen mode Exit fullscreen mode

The benchmark number I didn't expect: page 1 at 100k rows costs ~42 ms, deep page at offset 2450 costs ~50 ms. 8 ms overhead to skip 2450 rows. That's because LIMIT x OFFSET y in SQLite counts forward from zero — no scroll cursor, no magic. For building a UI on top of the API it's fine. For "give me rows 500,000 through 500,050" — use a time range query, it'll be faster and make more sense anyway.


M4 — level=ERROR and level=error are the same query now

This one seems like it should have been there from the start. It wasn't. If your service logs WARN and you searched level=warn, you got nothing and probably thought the tool was broken.

# All three hit the same index, return the same rows
logdive query 'level=ERROR'
logdive query 'level=warn'
logdive query 'level=Warning'
Enter fullscreen mode Exit fullscreen mode

Implementation: a functional expression index.

CREATE INDEX IF NOT EXISTS idx_level_norm ON log_entries(lower(level));
Enter fullscreen mode Exit fullscreen mode

The executor routes every level field lookup through lower(level) = ? with a Rust-lowercased bind value.

The wrong path I went down first: ALTER TABLE ADD COLUMN level_norm TEXT GENERATED ALWAYS AS (lower(level)) STORED. This works in SQLite! But it means existing databases need a migration and new installs use CREATE TABLE. You need a version guard to tell them apart. The functional index approach needs none of that — CREATE INDEX IF NOT EXISTS is idempotent, runs on every Indexer::open(), picks up existing databases automatically.

It's in docs/traps.md now. That file is starting to earn its name.

The benchmark result: lowercase, uppercase, mixed-case level queries on 100k rows — all three at ~51 ms. Identical. The index is doing exactly what it's supposed to do.


M5 — Distroless Docker and --health-check

The v0.2 Dockerfile had this healthcheck:

HEALTHCHECK CMD curl -fs http://localhost:4000/version || exit 1
Enter fullscreen mode Exit fullscreen mode

curl is 3.6 MB. It's in the image to make one TCP connection every 30 seconds. I finally stopped accepting this.

Runtime stage is now gcr.io/distroless/cc-debian12:nonroot. No shell, no package manager, no curl, uid 65532. Container drops to ~15 MB.

Since distroless has no shell, CMD curl ... in the Dockerfile gets rejected at build time. Good. That rejection forced me to do the right thing.

logdive-api now takes --health-check:

logdive-api --health-check
# opens TcpStream::connect("127.0.0.1:<port>")
# exits 0 if the server is up, exits 1 if it isn't
Enter fullscreen mode Exit fullscreen mode
HEALTHCHECK CMD ["/usr/local/bin/logdive-api", "--health-check"]
Enter fullscreen mode Exit fullscreen mode

The binary checks itself. No curl. No shell.

One trap: you can't do RUN mkdir -p /data in a distroless runtime stage. No shell to interpret it. You have to create the directory in the builder stage and copy it across:

# Builder
RUN mkdir -p /data

# Runtime
COPY --from=builder /data /data
Enter fullscreen mode Exit fullscreen mode

The error message when you forget this is not particularly helpful. Now it's here instead.


Breaking changes

Three things that will break something:

Scope Old New Fix
CLI query --format json query --output json one word in your scripts
logdive-core lib execute(q, conn, Option<usize>) execute(q, conn, QueryOptions) see below
Docker curl GET /version healthcheck --health-check TCP flag update compose / k8s probes

Library migration:

// v0.2
execute(&query, &conn, Some(1000))?;
execute(&query, &conn, None)?;

// v0.3
use logdive_core::executor::QueryOptions;
execute(&query, &conn, QueryOptions { limit: Some(1000), offset: None })?;
execute(&query, &conn, QueryOptions::default())?;  // no limit, no offset
Enter fullscreen mode Exit fullscreen mode

QueryOptions::default() is zero-offset, no-limit. If you were passing None before, it's a drop-in.


Benchmarks

New groups for v0.3 features (100k rows):

What Scenario Number
OR queries 2-branch, 50% match 68 ms
OR queries 4-branch, 100% match 99 ms
OR queries JSON field 2.5 ms
Paren groups (A OR B) AND C, 12.5% match 45 ms
Case-insensitive level lowercase / UPPERCASE / Mixed ~51 ms, identical
Pagination page 1 42 ms
Pagination deep page (offset 2450) 50 ms

Ingest numbers haven't changed (v0.2 ingest paths weren't touched):

What Number
Batched insert, 10k rows ~189k rows/s
Parse + insert, 10k rows ~150k rows/s

The 4-branch OR at 99 ms looks alarming until you remember it's returning every single row from a 100k-row corpus. The bottleneck is serialisation. The query engine itself is fine.


Tradeoffs, I'll be honest

--format--output will break your scripts. No alias. One word, then it's clean. If you want to be annoyed at me about it — fair, but this was the right call.

Distroless means no shell in the container. You can't docker exec -it mycontainer bash anymore. Use gcr.io/distroless/cc-debian12:debug if you need to poke around. The tradeoff is worth it, but it's a real operational change worth knowing about before you're in an incident.

Deep offset pagination has a cost. SQLite walks from row zero every time. For page 50 it's fine. For page 50,000, consider a time-range query instead — it'll be faster and is usually what you actually wanted anyway.

The full landing page redesign is still pending. The site has v0.3.0 content — accurate numbers, updated terminal preview. The full Astro 5 + Tailwind 4 redesign is waiting on a design file. Not v0.3's problem.


The website is live

Speaking of which: aryagorjipour.github.io/logdive is updated and real.

Stat cards reflect the current benchmarks. Terminal preview shows --output json and a paren query. The roadmap section is accurate. There's a GitHub stars counter that does a client-side fetch and falls back to gracefully if the API is having a day.

Go look at it. Tell me what's wrong with it.


What's next — and I'm taking a break after this

v0.4.0 planned scope:

  • Benchmark suite at 500k rows (100k isn't stressful enough for the executor's real hot paths)
  • Query latency improvements
  • --output yaml and --output csv
  • Windows --follow — the (dev, ino) rotation check has been Unix-only since v0.2
  • Configurable retention by source/tag

But honestly: after this release I'm stepping back from logdive for a bit to work on some other projects. v0.3.0 is in a clean state. prerelease-check.sh passes all 11 steps. 417 tests green. The breaking changes are documented.

Good place to breathe.

If something genuinely breaks (security issue, data loss) — file an issue, I'll look. Everything else waits for v0.4.0.



Arya Gorjipour — backend engineer, logdive maintainer.
@Aryagorjipour · @Arysmart1

If you run cargo bench on your machine and the numbers are interesting — I want to see them. If you debug a real incident with v0.3.0 — I really want to hear about that.

Top comments (0)