DEV Community: Sven Herrmann

Encrypting PostgreSQL Columns in Scala with skunk-crypt

Sven Herrmann — Mon, 08 Jun 2026 17:48:12 +0000

Some columns shouldn't sit in your database as plain text. Email addresses, phone numbers, tax IDs, anything a regulator or a breach-disclosure law cares about. The usual answers are either heavy (Postgres pgcrypto with key material living next to the data) or invasive (a bespoke encryption layer threaded through every query you write).

skunk-crypt takes a smaller, sharper angle: encryption is just another Skunk codec. You keep writing ordinary Skunk queries; you just swap text for crypt.text. Plain values in your application, AES-256-GCM ciphertext in the database, and the key never leaves your process.

// before
sql"INSERT INTO users (email) VALUES ($text)".command

// after — same query, encrypted column
sql"INSERT INTO users (email) VALUES (${crypt.text})".command

That's the whole idea. The rest of this post is how to use it well: getting set up, the one design decision that actually matters (searchable vs. not), key rotation, and the sharp edges worth knowing before you ship.

The mental model: encryption as a codec

A Skunk Codec[A] knows two things — how to encode an A into the string Postgres stores, and how to decode it back. skunk-crypt slots encryption into exactly that seam:

Encode: take your String/Int/UUID/…, serialize it, encrypt the bytes, store the result as TEXT.
Decode: read the TEXT, decrypt, parse back into your type.

Two consequences fall out of this design, and they're worth internalizing up front:

Every encrypted column is a TEXT column, regardless of its logical type. An encrypted Int is stored as text, an encrypted timestamptz is stored as text. The ciphertext is opaque to Postgres.
Postgres never sees a plaintext value and never holds a key. It stores and returns opaque strings. All crypto happens in your JVM. That also means Postgres can't do anything clever with the value — no server-side range queries, no LIKE, no arithmetic. More on that below, because it's the main constraint you design around.

Setup

skunk-crypt is published for Scala 2.13 and 3. Add it (check the Maven Central badge for the current version):

libraryDependencies += "de.thatscalaguy" %% "skunk-crypt" % "1.0.0"

Skunk, Cats, and Cats Effect are declared provided, so skunk-crypt reuses whatever versions are already on your classpath instead of dragging in its own. Make sure Skunk itself is present:

libraryDependencies += "org.tpolecat" %% "skunk-core" % "1.0.0"

Generate a key

Keys are raw AES keys, hex-encoded. 64 hex characters gives you AES-256 (32 and 48 are accepted too, for AES-128/192 — but there's no reason not to use 256):

openssl rand -hex 32

Treat this string like any other top-tier secret. Keep it out of source control; load it from an environment variable, a secrets manager, or your config of choice. If you lose it, the encrypted columns are gone — that's the point.

Build a validated `CryptContext`

The key is wrapped in a CryptContext. Construction is validated and returns an Either, so a malformed or wrong-length key fails immediately with a reason instead of detonating later inside a query:

import de.thatscalaguy.skunkcrypt.*

given CryptContext =
  CryptContext
    .keysFromHex(sys.env("DB_ENC_KEY"))
    .fold(reason => sys.error(s"Invalid encryption key: $reason"), identity)

The given matters: every crypt/cryptd codec takes the CryptContext implicitly. Define it once where your wiring lives, and the codecs pick it up everywhere. No effect wrappers, no threading a key parameter through your repository layer.

💡 Fail loud on bad keys. The .fold(sys.error, identity) above turns a bad key into an immediate startup crash. That's deliberate — you want "wrong key" to be a deploy-time failure, not a 2 a.m. decryption exception.

Your first encrypted round-trip

Here's a complete, runnable program. The columns are declared TEXT in Postgres even though age is logically an Int:

CREATE TABLE users (
  email TEXT,
  age   TEXT
)

import cats.effect.*
import skunk.*
import skunk.implicits.*
import org.typelevel.otel4s.trace.Tracer
import org.typelevel.otel4s.metrics.Meter
import de.thatscalaguy.skunkcrypt.*

object Demo extends IOApp.Simple:

  given Tracer[IO] = Tracer.Implicits.noop
  given Meter[IO]  = Meter.Implicits.noop

  // Generate with: openssl rand -hex 32
  given CryptContext =
    CryptContext
      .keysFromHex(sys.env("DB_ENC_KEY"))
      .fold(reason => sys.error(s"Invalid encryption key: $reason"), identity)

  val session: Resource[IO, Session[IO]] =
    Session
      .Builder[IO]
      .withHost("localhost")
      .withPort(5432)
      .withUserAndPassword("postgres", "postgres")
      .withDatabase("postgres")
      .single

  def run: IO[Unit] = session.use: s =>
    for
      _ <- s.execute(
             sql"INSERT INTO users (email, age) VALUES (${cryptd.text}, ${crypt.int4})".command
           )(("alice@example.com", 30))
      // The database now holds ciphertext; we read it back as plain values:
      rows <- s.execute(
                sql"SELECT email, age FROM users".query(cryptd.text ~ crypt.int4)
              )
      _ <- IO.println(rows) // List((alice@example.com, 30))
    yield ()

Run it, then peek at the raw row from psql:

       email                          |        age
--------------------------------------+----------------------
 b3pQ7t1mZ2k9c0Aa.0.9sFh2Kx...==      | 7Yk2p0qL4mZ1aa.0.Qf...

What you're looking at is the on-disk format: base64(iv).keyIndex.base64(ciphertext). The IV and the key index travel with the value, which is what makes per-row IVs and key rotation work without any extra bookkeeping tables. Your application reads alice@example.com and 30; the database only ever held the opaque strings.

The one decision that matters: `crypt` vs `cryptd`

skunk-crypt gives you two codec objects with the same set of codecs. Picking between them per column is the main design choice you make, so it's worth understanding precisely.

Object	Mode	Same input → same ciphertext?	Use it for
`crypt`	Non-deterministic	No (random IV per write)	The safe default — anything you don't search
`cryptd`	Deterministic	Yes (synthetic IV)	Columns you must match with `WHERE x = ?`

crypt is the safe default. Every write gets a fresh random IV, so encrypting "alice@example.com" twice produces two completely different ciphertexts. An observer with full read access to the table can't even tell which rows share a value. The cost: you can never query by that column — there's nothing to match against.

cryptd is deterministic on purpose. The same plaintext always encrypts to the same ciphertext, which is exactly what lets you do WHERE email = ?. It achieves this without the classic fixed-IV footgun: instead of reusing one IV everywhere (which would leak the XOR of your plaintexts through GCM's keystream), it derives the IV from the plaintext itself via HMAC — a synthetic IV, the same idea as AES-GCM-SIV. Distinct values still get distinct keystreams; equal values line up so they stay searchable.

The trade-off is unavoidable and you should name it out loud when you choose cryptd: a deterministic column reveals which rows share the same value. An attacker who can read the table can see that 4,000 rows have the same country ciphertext, or run a frequency analysis. That's fine for a high-cardinality identifier like an email you need to look up by; it's a poor choice for a low-cardinality field like gender or subscription_tier.

Rule of thumb: reach for cryptd only on columns you genuinely need to look up by exact value, and only when leaking equality is acceptable. Everything else gets crypt.

A more realistic table

Mixed columns is the normal case — some searchable, most not:

CREATE TABLE customers (
  id          TEXT,   -- encrypted UUID, looked up by id  -> cryptd
  email       TEXT,   -- looked up at login               -> cryptd
  full_name   TEXT,   -- displayed, never queried         -> crypt
  tax_id      TEXT,   -- sensitive, never queried         -> crypt
  balance     TEXT,   -- a BigDecimal, never queried      -> crypt
  created_at  TEXT    -- a timestamptz, never queried     -> crypt
)

val insert =
  sql"""INSERT INTO customers (id, email, full_name, tax_id, balance, created_at)
        VALUES (${cryptd.uuid}, ${cryptd.text}, ${crypt.text},
                ${crypt.text}, ${crypt.numeric}, ${crypt.timestamptz})""".command

// pass the values in the same order:
session.execute(insert)(
  (id, "alice@example.com", "Alice Example", "DE123456789", BigDecimal("42.00"), createdAt)
)

Notice the codecs read just like Skunk's built-in ones — cryptd.uuid, crypt.numeric, crypt.timestamptz. You combine them with ~ (or map to a case class) exactly as you always would. The encryption is invisible at the call site.

Searching encrypted columns

Because email was written with cryptd, the encrypted form of a given address is stable, so you can match it directly. The parameter is encrypted deterministically and compared against the stored ciphertext — Postgres does an ordinary string equality:

val byEmail: Query[String, BigDecimal] =
  sql"SELECT balance FROM customers WHERE email = ${cryptd.text}".query(crypt.numeric)

// you pass the plaintext; skunk-crypt encrypts it to the same ciphertext that's stored
session.execute(byEmail)("alice@example.com")

This is the payoff of deterministic mode, and it composes with a real performance win: the deterministic ciphertext is stored as exact TEXT, so an ordinary B-tree index on the column makes the lookup fast.

CREATE INDEX customers_email_idx ON customers (email);

The index is over ciphertext, but since equal plaintext means equal ciphertext, an equality probe lands right on it. (Indexing a crypt column is pointless — every row's ciphertext differs, so nothing ever matches.)

What you cannot do on any encrypted column:

LIKE / prefix / substring search — the ciphertext bears no resemblance to the plaintext.
Range queries (age > 18, created_at BETWEEN …) or ORDER BY — ordering is destroyed by encryption.
Server-side arithmetic, SUM, GROUP BY on the value, etc.

If you need those, the column probably can't be encrypted at rest this way — or you keep a separate, coarse, non-sensitive column to filter on (e.g. an unencrypted age_band) and encrypt the precise value.

Supported types

Both crypt and cryptd expose the same codec set, so the search/no-search choice is orthogonal to the column's type:

Codec	Scala type
`text`	`String`
`int2`	`Short`
`int4`	`Int`
`int8`	`Long`
`float4`	`Float`
`float8`	`Double`
`bool`	`Boolean`
`uuid`	`java.util.UUID`
`numeric`	`BigDecimal`
`date`	`java.time.LocalDate`
`timestamp`	`java.time.LocalDateTime`
`timestamptz`	`java.time.OffsetDateTime`

Key rotation

Security policies (and auditors) eventually ask you to rotate keys. skunk-crypt builds this in: keysFromHex accepts several keys. Encryption always uses the last one, and the index of the key used is embedded in each stored value — so older keys keep decrypting the rows they originally encrypted.

// new key encrypts; both keys still decrypt
given CryptContext =
  CryptContext.keysFromHex(oldKeyHex, newKeyHex).fold(sys.error, identity)

The mechanics:

Append the new key. New writes use it (key index 1); existing rows still carry index 0 and decrypt with the old key.
Over time — lazily, or in a batch job — re-encrypt old rows by reading and writing them back. Each rewrite re-encrypts with the newest key.
Once every row is on the new key, you could drop the old one.

⚠️ Only ever append keys. Never reorder or remove them. The key index is stored inside each row. Reorder the list and every existing row now points at the wrong key; remove a key still in use and those rows become undecryptable. Append-only is the whole contract.

The rotation gotcha for `cryptd` columns

Here's the sharp edge that the happy-path docs gloss over, and it's worth stopping on. A cryptd column's ciphertext depends on the encryption key, because the synthetic IV is derived from the plaintext using the last key. So the moment you append a rotation key:

Old rows still decrypt fine (their key index is embedded). Reading is unaffected.
But a fresh WHERE email = ${cryptd.text} lookup now encrypts the parameter with the new key — producing a different ciphertext than the one stored under the old key. The equality match silently misses the old rows.

In other words: rotating a key doesn't just change new writes, it temporarily breaks search over deterministic columns that haven't been re-encrypted yet. The fix is to re-encrypt those columns promptly after rotation rather than lazily:

// Re-encrypt every customer's deterministic columns onto the newest key.
// Reading decrypts with whatever key the row used; writing re-encrypts with the latest.
val reencrypt: IO[Unit] =
  session.use: s =>
    for
      rows <- s.execute(sql"SELECT id, email FROM customers".query(cryptd.uuid ~ cryptd.text))
      _    <- rows.traverse_ { case id ~ email =>
                s.execute(
                  sql"UPDATE customers SET email = ${cryptd.text} WHERE id = ${cryptd.uuid}".command
                )((email, id))
              }
    yield ()

For crypt (non-deterministic) columns this never bites, because you never search them — lazy re-encryption is perfectly fine there. It's specifically the searchable columns that need an eager rewrite.

Error handling

Two failure modes are typed, and both surface through Skunk's normal codec path.

Bad key — caught at construction. keysFromHex returns Either[String, CryptContext], with the reason (and the offending key's index, for multi-key lists). A bad key can't reach a query.

Bad ciphertext — caught at decode. Decryption raises a CryptError (a RuntimeException subtype, so it propagates like any decode failure):

Error	Meaning
`MalformedCiphertext`	The stored value isn't `iv.keyIndex.data` — e.g. legacy plaintext
`DecryptionFailure`	Wrong key, unknown key index, or a failed GCM authentication tag

MalformedCiphertext is the one you'll meet first in real life: it's what you get when an existing, unencrypted column still holds plaintext and you point an encrypted codec at it. That's your signal to run a one-time migration — read each row with the plain Skunk codec, write it back with the encrypted codec — before switching reads over.

A DecryptionFailure from a modified value is a feature, not a bug: AES-GCM is authenticated, so a tampered or truncated ciphertext fails to decrypt rather than quietly returning garbage. If someone edits a byte in the database, you find out.

What it protects — and what it doesn't

Be honest with your threat model. skunk-crypt encrypts column values. It is strong against the specific scenario of someone reading your database — a leaked backup, a stolen replica, an over-privileged analyst, a SELECT-only breach. They get ciphertext and, without your key, nothing else.

It does not:

Hide column names, row counts, or access patterns.
Protect data in your application's memory (it's plaintext there by design).
Replace transport security (use TLS to Postgres) or at-rest disk encryption — those defend different layers.
Conceal equality on cryptd columns (covered above).

It's one focused layer: confidentiality of sensitive values against database-level exposure. Stack it with the others; don't expect it to be all of them.

Testing your integration

Codec round-trips are pure, so you can unit-test encrypt-then-decrypt without a database at all. For the real thing — values actually flowing through Postgres — skunk-crypt's own suite uses Testcontainers to spin up a throwaway postgres:16, which is a good pattern to copy:

test("round-trips text and int columns through Postgres") {
  withContainers { case db: GenericContainer =>
    session(db.container.getMappedPort(5432)).use { s =>
      for
        _    <- s.execute(
                  sql"INSERT INTO test (string, numbers) VALUES (${cryptd.text}, ${cryptd.int4})".command
                )(("Hello", 123))
        rows <- s.execute(
                  sql"SELECT string, numbers FROM test".query(cryptd.text ~ cryptd.int4)
                )
      yield assertEquals(rows, List("Hello" -> 123))
    }
  }
}

The thing worth asserting beyond a plain round-trip: that the stored form really is ciphertext (read the column with a plain text codec and check it doesn't equal the plaintext), and that a cryptd write is stable across runs while a crypt write isn't.

A checklist before you ship

[ ] Encryption key loaded from a secret, never committed. Startup fails loudly on a bad key.
[ ] Every sensitive column is TEXT in the schema.
[ ] cryptd only where you must search by exact value and leaking equality is acceptable; crypt everywhere else.
[ ] B-tree index on each cryptd column you look up by.
[ ] A migration plan for any column that currently holds plaintext (watch for MalformedCiphertext).
[ ] A rotation runbook that appends keys and eagerly re-encrypts cryptd columns.
[ ] You've written down what this does and doesn't protect, so nobody assumes it's more than column-value confidentiality.

Wrapping up

The thing I like about skunk-crypt is that it refuses to be a framework. It's a handful of codecs and one implicit context. You don't restructure your code around it — you swap text for crypt.text, decide per column whether you need to search it, and carry on writing ordinary Skunk. The hard parts (authenticated encryption, per-row IVs, embedded key indices for rotation) are handled; the parts you have to think about (deterministic-equality leakage, rotation + re-encryption, threat model) are the parts no library can decide for you anyway.

Source, issues, and the full API reference:

👉 github.com/ThatScalaGuy/skunk-crypt · Scaladoc

It's Apache-2.0 licensed and contributions are welcome. If you put it into production, I'd genuinely like to hear how it goes — and if your project needs a hand with Scala, data, or the boring-but-critical security plumbing, get in touch.

[Boost]

Sven Herrmann — Thu, 28 May 2026 18:37:55 +0000

Sven Herrmann

May 25

GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge

#webgpu #javascript #typescript #performance

4 min read

GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge

Sven Herrmann — Mon, 25 May 2026 08:15:44 +0000

JavaScript has a parallelism problem. The language is single-threaded by design, and while Web Workers help, they top out at a handful of cores. Meanwhile, the most powerful parallel processor in your machine — the GPU, with its thousands of cores — has been almost completely off-limits to web developers.

WebGPU changes that. It exposes the GPU to the browser for general-purpose compute, not just graphics. The catch? Using it directly means writing WGSL shaders, managing buffers and bind groups, handling device initialization, and orchestrating async data transfers. That's a lot of boilerplate for what is conceptually just "run this function over an array, fast."

GPGPU.js removes that boilerplate. You write plain JavaScript; it runs on the GPU.

import { gpu } from "@thatscalaguy/gpgpu.js";

const doubled = await gpu.map([1, 2, 3, 4], x => x * 2); // [2, 4, 6, 8]

No shaders. No buffers. No device setup. That x => x * 2 arrow function is parsed and compiled to a WGSL compute shader, dispatched across the GPU, and the result comes back as a typed array.

🎮 Try it right now in your browser — no install needed: thatscalaguy.github.io/GPGPU.js

The API in 30 seconds

The whole point is that the surface area is tiny. Here's most of it:

import { gpu } from "@thatscalaguy/gpgpu.js";

// Element-wise math (arrays or array + scalar)
await gpu.add([1, 2, 3], [4, 5, 6]);   // [5, 7, 9]
await gpu.multiply([1, 2, 3], 10);     // [10, 20, 30]

// Map with a JS arrow function — compiled to a GPU shader
await gpu.map(data, x => Math.sqrt(x) + 1);

// Reductions
await gpu.sum([1, 2, 3, 4, 5]);        // 15
await gpu.max([3, 1, 4, 1, 5, 9]);     // 9

// Sort (GPU bitonic sort) and prefix sum
await gpu.sort(data);
await gpu.scan(data);

// Matrix multiply (tiled, uses shared memory)
await gpu.matmul(matA, matB, { rowsA: 64, colsA: 64, colsB: 64 });

Everything is async because the GPU works asynchronously — you await the result and get a Float32Array back.

Arrow functions become shaders

The magic trick is the codegen. When you pass x => x * 2 + 1, GPGPU.js:

Parses the function and builds an intermediate representation of the expression.
Emits WGSL from that IR — x * 2 + 1 becomes output[idx] = (x * 2.0) + 1.0;.
Compiles and dispatches the shader through WebGPU across thousands of cores.
Returns the result as a typed array.

A useful subset of JavaScript is supported inside expressions:

Arithmetic: + - * / %
Comparisons: < > <= >= == !=
Ternary: a > 0 ? a : -a
Math functions: Math.abs, Math.sqrt, Math.pow, Math.min, Math.max, Math.floor, Math.ceil, Math.sin, Math.cos, Math.tan, Math.exp, Math.log

If you're worried about minifiers mangling your arrow functions in production, you can pass a string expression instead — it compiles to exactly the same shader:

await gpu.map(data, "x * x + 1"); // minifier-safe

Pipelines: keep data on the GPU

The expensive part of GPU computing usually isn't the math — it's shuttling data back and forth across the PCIe bus between CPU and GPU. If you naively chain operations, you pay that round-trip every step:

// ❌ Three CPU↔GPU round-trips
let r = await gpu.map(data, x => x * 2);
r = await gpu.map(r, x => x + 1);
const total = await gpu.sum(r);

Pipelines fix this. The data stays resident on the GPU between steps, and you only transfer at the start and end:

// ✅ One round-trip, data stays on GPU between steps
const result = await gpu.pipeline()
  .map(x => x * 2)
  .map(x => x + 1)
  .reduce((a, b) => a + b, 0)
  .run(data);

For any non-trivial workload, this is the difference between "the GPU is slower than the CPU" and a real speedup.

Escape hatch: write your own WGSL

The high-level API covers a lot, but sometimes you need full control. The createKernel API lets you write raw WGSL while still letting GPGPU.js handle device setup, buffer pooling, and dispatch:

const kernel = await gpu.createKernel({
  workgroupSize: 64,
  shader: `
    @group(0) @binding(0) var<storage, read> input0: array<f32>;
    @group(0) @binding(1) var<storage, read_write> output: array<f32>;
    @compute @workgroup_size(64)
    fn main(@builtin(global_invocation_id) gid: vec3u) {
      let idx = gid.x;
      output[idx] = input0[idx] * input0[idx];
    }
  `,
  inputs: [{ type: "f32", size: 1024 }],
  output: { type: "f32", size: 1024 },
});

const result = await kernel.run(inputData);

It runs everywhere — even without WebGPU

WebGPU support is good and growing, but it's not universal yet:

Chrome 113+ / Edge 113+
Firefox 141+ (Windows), 145+ (macOS)
Safari 18+

GPGPU.js doesn't make you choose. When WebGPU is unavailable, every operation transparently falls back to a CPU implementation. Your code doesn't change; it just runs slower where there's no GPU and faster where there is. That makes it safe to ship in production today without feature-detection branches scattered through your code.

Built for modern JavaScript projects

A few things that make it pleasant to actually use:

TypeScript-first — full type safety, zero runtime dependencies.
Tree-shakeable — ships both ESM and CJS; import only what you use.
No build step required — works straight from npm.

npm install @thatscalaguy/gpgpu.js

You can also create isolated instances instead of using the default singleton, and clean them up explicitly:

import { GPU } from "@thatscalaguy/gpgpu.js";

const myGpu = new GPU();
// ... use it ...
myGpu.destroy(); // release GPU resources when done

Where this is useful

Anything that's "the same operation over a lot of data" is a candidate:

Image and signal processing (per-pixel transforms, convolutions)
Numerical simulations and physics
Data transforms over large arrays
Linear algebra — matrix multiplication is built in
ML inference building blocks

If your workload is small or branch-heavy, the CPU may still win — GPU shines when you have thousands to millions of independent elements. As always: measure.

Try it

The fastest way to get a feel for it is the hosted playground — write an expression, hit run, see it execute on your actual GPU:

👉 thatscalaguy.github.io/GPGPU.js

And the source, issues, and docs live on GitHub:

👉 github.com/ThatScalaGuy/GPGPU.js

It's MIT-licensed and contributions are welcome. If you build something with it, I'd love to hear about it.

GPGPU.js is open source under the MIT license.

DEV Community: Sven Herrmann

Encrypting PostgreSQL Columns in Scala with skunk-crypt

The mental model: encryption as a codec

Setup

Generate a key

Build a validated CryptContext

Your first encrypted round-trip

The one decision that matters: crypt vs cryptd

A more realistic table

Searching encrypted columns

Supported types

Key rotation

The rotation gotcha for cryptd columns

Error handling

What it protects — and what it doesn't

Testing your integration

A checklist before you ship

Wrapping up

[Boost]

GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge

GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge

The API in 30 seconds

Arrow functions become shaders

Pipelines: keep data on the GPU

Escape hatch: write your own WGSL

It runs everywhere — even without WebGPU

Built for modern JavaScript projects

Where this is useful

Try it

Build a validated `CryptContext`

The one decision that matters: `crypt` vs `cryptd`

The rotation gotcha for `cryptd` columns