DEV Community: Density Tech

We Tried to Build Analytics Without a Database. It Sort of Worked.

Density Tech — Thu, 05 Feb 2026 17:40:13 +0000

The client needed product analytics.

The problem wasn’t scale.
The problem was money.

Snowflake’s minimum was roughly $600 per month. The total infrastructure budget was closer to $200. Given that constraint, debating which data warehouse to use felt like the wrong conversation to have.

So instead, we asked a different question:

What if we didn’t use a database at all?

Dumping Everything into Object Storage

The first decision was straightforward: use object storage.

For this engagement, we chose MinIO. Events were ingested, written out as Parquet files, and stored durably. No long-running services. No query engine sitting idle. Just storage.

That immediately raised the obvious concern:
how do you query this without turning it into an unmaintainable pile of files?

That’s where DuckDB entered the picture.

DuckDB is an in-process SQL engine. No server, no cluster, no operational setup. Install it, point it at Parquet files, and start writing SQL.

Initially, it felt too simple to be serious.

We tried it anyway.

Within a day, we had funnel queries, retention calculations, and basic aggregations running directly against Parquet files in S3-compatible storage. No ingestion jobs. No warehouse loaders. No retry logic.

It worked far better than expected.

The Iceberg Detour That Changed Everything

About two weeks into the project, Apache Iceberg came up during a casual discussion.

At that point, we weren’t actively looking to change anything. The system was working. But Iceberg promised things that were starting to matter:

ACID semantics
Schema evolution
Snapshot isolation
Table-level operations on object storage

We treated it as an experiment.

That experiment quietly became the foundation.

Once we moved from loosely managed Parquet files to Iceberg tables, several problems disappeared immediately:

Schema changes became manageable
Bad data was no longer permanent
Table state became explicit and queryable

DuckDB’s Iceberg extension made the integration trivial. One ATTACH, and the tables behaved like standard relational tables.

The value became obvious the first time a customer sent several hours of malformed events. Previously, fixing that would have meant manually tracking files and rewriting data. With Iceberg, it was a single DELETE statement.

That alone justified the decision.

Local Development Without Friction

One outcome that surprised me was how much this improved local development.

DuckDB’s UI mode (duckdb -ui) provides a browser-based SQL editor running entirely on a developer’s machine. No credentials to manage. No shared environments. No waiting for services to start.

Even developers who typically avoid SQL were able to explore real analytics data locally.

That level of accessibility is rare in analytics systems—and for an MVP, it mattered more than raw throughput.

The Limitations (Because There Are Always Limitations)

This setup isn’t a silver bullet.

DuckDB is not designed for high concurrency. Once concurrent access increased, caching became necessary. Event ingestion is buffered and batched. Writes are controlled and intentionally infrequent.

This is not a multi-tenant analytics platform.

And it was never meant to be.

What We Actually Delivered

In roughly three weeks, the client had:

A functional product analytics backend
Iceberg-managed tables on object storage
DuckDB-powered analytical queries and derivations
Infrastructure costs under $50 per month
A system solid enough to demo to customers

Most importantly, they avoided committing early to an expensive or complex architecture before the product justified it.

The Real Takeaway

From my perspective, if you’re pre–product-market fit, a data warehouse is often the wrong first move.

You don’t need infinite scale.
You don’t need query queues.
You don’t need vendor contracts.

You need speed, flexibility, and the ability to change direction without regret.

DuckDB plus object storage—and Iceberg when structure starts to matter—isn’t just cheaper. It’s better suited for experimentation.

We’ll move to something heavier when it’s required.

For now, this works. And as an engineer, it’s been a genuinely enjoyable system to build.

What Actually Happens When You Run brew install

Density Tech — Wed, 28 Jan 2026 18:21:18 +0000

The brew install command is used to install software on your computer.

This command is pretty useful because it makes it easy to get the software you need.

When you run brew install it looks for the software you want. Then it downloads it.

Then it installs the software on your computer.

The brew install command also takes care of any software that is required for the software you are installing to work properly.

For example if the software you are installing needs software to run brew install will download and install that software too.

So brew is a handy tool to have on your computer.

You can use brew install to install all sorts of software like text editors and programming tools.

So the time you need to install some software you can use brew install to do it.

The brew install command is a help when you are working with software on your computer.

Most developers use Homebrew every day. Very few people actually think about what Homebrew is really doing. Homebrew looks like a package manager to them. Homebrew feels like apt or yum when they use it.. Inside Homebrew it does things in a very different way. Homebrew behaves a lot, like Git when it is working. Homebrew also does things like a build system. It keeps the filesystem in order.

When you finally get it a lot of things that your system does will start to make sense. You will be able to figure out a lot of system problems that seemed weird before like why your system stopped working.

Let us open the hood.

Homebrew is not something that installs things for you. Homebrew is actually a place where you can find things to build. Think of Homebrew as a list of things you can build on your computer. Homebrew is, like a registry that keeps track of all the things you can build.

When you run

brew install docker

It does not mean that you should download Docker and then put it somewhere on your computer. Docker is not something that you can just download and place anywhere.

What this thing really means is:

“Look up a build recipe for Docker, find a verified binary (or source), install it into a versioned directory, and expose it through symlinks.”

Homebrew is actually like a place where you can find and manage builds from the source rather than what you would normally think of as a package manager. Homebrew is really good, at keeping track of all the builds you have kind of like a big library of stuff you have put together yourself.

Every package is defined by a formula that is written in Ruby. This formula is stored in a Git repository called homebrew-core. The homebrew-core repository contains things like:

Where to download the binary
What checksum should the file have? The correct checksum, for the file is important to know. We need to find out what checksum the file should have.

When we are talking about something that needs to be built or unpacked the first thing that comes to my mind is that we should have a set of instructions.

The instructions for building or unpacking the thing should be easy to understand.

The Cellar: Homebrew’s real filesystem

Everything Homebrew installs goes into one place:

/opt/homebrew/Cellar

(on Apple Silicon Macs)

Inside that directory, every package gets its own folder:

/opt/homebrew/Cellar/docker/29.1.5/

That folder contains the real Docker binary and everything it depends on. Homebrew never mutates it. That directory is immutable. If you install a new version, you get a new directory:

/opt/homebrew/Cellar/docker/29.1.6/

Nothing overwrites anything. No files are replaced in place. This is a huge design choice — and it’s why Homebrew is so safe.

How commands appear in your PATH

You don’t run binaries from the Cellar directly. Instead, Homebrew creates symlinks:

/opt/homebrew/bin/docker → ../Cellar/docker/29.1.5/bin/docker

Your shell sees /opt/homebrew/bin in PATH, so when you type docker, it follows the symlink to the correct version.

When you upgrade Docker, Homebrew just moves the symlink to point to the new directory. The old version is still there, untouched.

This is why Homebrew supports instant rollbacks:

brew switch docker 29.1.5

All it does is change symlinks.

No files are copied. Nothing is rebuilt.

Why Homebrew almost never corrupts your system

Because Homebrew never installs into:

/usr/bin

/bin

/lib

system frameworks

Everything lives under /opt/homebrew.

That’s not an accident — it’s a safety boundary. If something breaks, you can literally delete /opt/homebrew and your OS is still intact.

Bottles vs source builds

When Homebrew can, it downloads a bottle — a precompiled binary built by the Homebrew maintainers. That’s why installs are fast.

If no bottle exists for your OS + CPU combo, Homebrew falls back to compiling from source using the instructions in the formula. Same recipe, different execution path.

Either way, the result still lands in the Cellar.

TL;DR;

Homebrew looks simple because Homebrew is actually hiding a deliberate architecture. The people who made Homebrew did a lot of thinking about how Homebrew should work. They wanted Homebrew to be easy to use so they made sure that the complicated parts of Homebrew are not visible, to the user. This means that Homebrew has a lot of things going on behind the scenes that you do not see when you use Homebrew.

Do You Really Need Kafka? A Practical Alternative with Postgres

Density Tech — Sun, 25 Jan 2026 09:18:31 +0000

Kafka: The Right Tool, Used Too Often

Apache Kafka has become the default answer to almost any asynchronous or event-driven problem. It is powerful, proven at scale, and excellent at handling large volumes of data with strong guarantees. If you are building a real-time data platform, a streaming system, or anything that needs to fan out events to many consumers, Kafka is often the right tool.

But Kafka also comes with real cost. Running it in production means dealing with brokers, partitions, consumer groups, rebalancing, retention policies, and monitoring. Even with managed services, you are still paying in infrastructure and in engineering time.

In practice, many teams end up using Kafka not because they need a streaming platform, but because they just need a reliable queue.

And in those cases, Kafka is usually correct — but often overkill.

The Hidden Cost of “Just Use Kafka”

For early-stage systems, internal tools, or moderate workloads, Kafka tends to introduce more complexity than actual value.

You run a distributed system even when your problem is not distributed.
You operate a streaming platform even when your use case is just background jobs.
You manage offsets and consumer groups when a simple retry would be enough.
You debug through dashboards and logs instead of just looking at the data.

The result is a system that works well, but feels heavy. Heavy to run, heavy to reason about, and heavy to change.

This is exactly where Postgres-based queues start to look very attractive.

Why Postgres Can Win in Economics and Debuggability

Postgres is already there in almost every backend. It is monitored, backed up, and familiar. Using it as a lightweight message queue adds no new service, no new cluster, and no new operational surface.

From a cost perspective, it is hard to beat:
there are no brokers to run, no separate infrastructure, and no extra managed services to pay for.

From a debugging perspective, it is even better:
every message is just a row,
every failure is visible,
retries are tracked,
and stuck messages can be inspected or fixed with plain SQL.

Instead of debugging a distributed system, you debug data.
And for most engineers, that is a much simpler and more productive mental model.

When Postgres Is Actually the Right Choice

Postgres works well as a message queue when async processing is just a part of your system, not the main thing your system exists to do. In these cases, you usually care more about simplicity and reliability than extreme scale or global distribution.

pgmq fits nicely for things like background jobs, webhook handling, retry systems, internal workflows, and small ETL pipelines. These setups usually have a few producers, a few consumers, and traffic that is steady but not massive. What they really need is visibility and control, not a full-blown streaming platform.

This is where Postgres shines. You can wrap business logic and queue operations in the same transaction. You don’t need to run any extra infrastructure. And you can see exactly what’s happening just by querying tables. If something breaks, you can inspect the message, fix it, and retry it directly.

pgmq is not meant for high-throughput streaming, analytics pipelines, or cross-region event systems. Once the queue becomes the core of your architecture, and not just a helper, you are in Kafka territory.

The simple rule is: use Postgres when the queue supports your system. Use Kafka when the queue is your system.

pgmq: How It Works

At a high level, pgmq is not doing anything magical. It is just using Postgres tables, locks, and timestamps to behave like a message queue. There is no separate broker, no background service, and no hidden state. Everything lives inside the database.

When you create a queue in pgmq, it creates two main tables for you:

pgmq.q_events – the live queue

pgmq.a_events – the history of processed messages

The live table is where all active messages sit. Each row is one message. The important columns are:

msg_id – unique ID for the message

enqueued_at – when the message was produced

vt – when the message can be read again

read_ct – how many times it has been delivered

message – your actual JSON payload

headers - your actual headers

This single table gives you most queue features in one place:

Durability → rows stored in Postgres

Visibility timeout → vt

Retry count → read_ct

Ordering → ORDER BY msg_id

Backlog → SELECT count(*)

When a consumer reads messages, pgmq simply locks rows using FOR UPDATE SKIP LOCKED and moves vt into the future. If the consumer crashes, the lock is released and the message becomes visible again. That is your retry mechanism.

When the consumer finishes, calling pgmq.delete() removes the row from q_events and moves it into a_events. That archive table is extremely useful in practice — it gives you a full audit trail of what was processed, when, and how many times.

There is also a small pgmq.meta table which stores queue-level configuration like visibility timeouts and creation metadata. Think of it as the control plane.

The key thing to understand is this: pgmq is just SQL implementing queue semantics. If you can read the tables, you can understand the system. There is no black box. What you see in the database is exactly what the queue is doing.

And that is precisely why pgmq feels so easy to debug compared to traditional brokers.

In the next section, we’ll set up a minimal pgmq environment using Docker and Kubernetes, and walk through a working producer–consumer example.

Setting Up pgmq Locally (A Minimal Working Example)

We’ll start by running Postgres with pgmq using a single Kubernetes deployment.

Prerequisties : Docker, Minikube

# postgres.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: ghcr.io/pgmq/pg18-pgmq:v1.7.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POSTGRES_USER
              value: xxx
            - name: POSTGRES_PASSWORD
              value: xxx
            - name: POSTGRES_DB
              value: queue_db
          ports:
            - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  type: NodePort
  selector:
    app: postgres
  ports:
    - port: 5432
      nodePort: 30007

kubectl apply -f postgres.yaml
kubectl get pods

kubectl port-forward svc/postgres 5432:5432

In new Terminal

psql -h localhost -U xxx -d queue_db

To create a queue

CREATE EXTENSION pgmq;
SELECT pgmq.create('events');

Producer Setup (Python)

# producer.py

import psycopg2
import json
import time

conn = psycopg2.connect(
    host="postgres",
    port=5432,
    user="xxx",
    password="xxx",
    dbname="queue_db"
)

cur = conn.cursor()
i = 0

while True:
    payload = {"id": i, "type": "order_created"}
    cur.execute("SELECT pgmq.send('events', %s)", [json.dumps(payload)])
    conn.commit()
    print("Produced:", payload)
    i += 1
    time.sleep(1)

# Dockerfile

FROM python:3.11-slim
WORKDIR /app
RUN pip install psycopg2-binary
COPY producer.py .
CMD ["python", "producer.py"]

# producer.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pg-producer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pg-producer
  template:
    metadata:
      labels:
        app: pg-producer
    spec:
      containers:
        - name: producer
          image: pg-producer
          imagePullPolicy: IfNotPresent


docker build -t pg-producer .
kubectl apply -f producer.yaml

Consumer Setup (Python)

# consumer.py

import psycopg2
import time

conn = psycopg2.connect(
    host="postgres",
    port=5432,
    user="user",
    password="pass",
    dbname="queue_db"
)

cur = conn.cursor()

while True:
    cur.execute("SELECT * FROM pgmq.read('events', 1, 5)")
    rows = cur.fetchall()

    if not rows:
        time.sleep(1)
        continue

    for row in rows:
        msg_id = row[0]
        body = row[3]
        print("Consumed:", body)

        cur.execute("SELECT pgmq.delete('events', %s)", [msg_id])
        conn.commit()

# Dockerfile

FROM python:3.11-slim
WORKDIR /app
RUN pip install psycopg2-binary
COPY consumer.py .
CMD ["python", "consumer.py"]

# consumer.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pg-consumer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pg-consumer
  template:
    metadata:
      labels:
        app: pg-consumer
    spec:
      containers:
        - name: consumer
          image: pg-consumer
          imagePullPolicy: IfNotPresent


docker build -t pg-consumer .
kubectl apply -f consumer.yaml

Ensure all pods are running

m-mfkghf4fgk producer % kubectl get pods

NAME                          READY   STATUS    RESTARTS   AGE
pg-consumer-c976b84f8-t84mt   1/1     Running   0          27s
pg-producer-85cd846b4-llj72   1/1     Running   0          92s
postgres-5f9b95c698-pjjnp     1/1     Running   0          20m

Verify Logs

kubectl logs deployment/pg-producer

kubectl logs deployment/pg-consumer

Optional : PG WEB UI

m-mfkghf4fgk producer % kubectl run pgweb --image=sosedoff/pgweb -- \
  --host=postgres \
  --port=5432 \
  --user=xxx \
  --pass=xxx \
  --db=queue_db \
  --ssl=disable

kubectl port-forward pod/pgweb 8081:8081

Visit in browser - http://localhost:8081

Run Sample query to see the events

SELECT * FROM pgmq.q_events LIMIT 20;

Not Everything Needs to Be Kafka

pgmq is not trying to replace Kafka, and it shouldn’t. It solves a different problem. If you need high-throughput streaming, multiple independent consumers, or large-scale event processing, Kafka is still the right tool.

But if all you need is a reliable, observable queue for background work, retries, or internal workflows, Postgres is often more than enough. You already run it, you already trust it, and you can see exactly what is happening inside it.

In many systems, the queue is not the product — it is just plumbing. And for plumbing, simple and boring is usually better than powerful and complex.