DEV Community

Sven Herrmann
Sven Herrmann

Posted on • Originally published at thatscalaguy.de

Encrypting PostgreSQL Columns in Scala with skunk-crypt

Some columns shouldn't sit in your database as plain text. Email addresses, phone numbers, tax IDs, anything a regulator or a breach-disclosure law cares about. The usual answers are either heavy (Postgres pgcrypto with key material living next to the data) or invasive (a bespoke encryption layer threaded through every query you write).

skunk-crypt takes a smaller, sharper angle: encryption is just another Skunk codec. You keep writing ordinary Skunk queries; you just swap text for crypt.text. Plain values in your application, AES-256-GCM ciphertext in the database, and the key never leaves your process.

// before
sql"INSERT INTO users (email) VALUES ($text)".command

// after — same query, encrypted column
sql"INSERT INTO users (email) VALUES (${crypt.text})".command
Enter fullscreen mode Exit fullscreen mode

That's the whole idea. The rest of this post is how to use it well: getting set up, the one design decision that actually matters (searchable vs. not), key rotation, and the sharp edges worth knowing before you ship.

The mental model: encryption as a codec

A Skunk Codec[A] knows two things — how to encode an A into the string Postgres stores, and how to decode it back. skunk-crypt slots encryption into exactly that seam:

  • Encode: take your String/Int/UUID/…, serialize it, encrypt the bytes, store the result as TEXT.
  • Decode: read the TEXT, decrypt, parse back into your type.

Two consequences fall out of this design, and they're worth internalizing up front:

  1. Every encrypted column is a TEXT column, regardless of its logical type. An encrypted Int is stored as text, an encrypted timestamptz is stored as text. The ciphertext is opaque to Postgres.
  2. Postgres never sees a plaintext value and never holds a key. It stores and returns opaque strings. All crypto happens in your JVM. That also means Postgres can't do anything clever with the value — no server-side range queries, no LIKE, no arithmetic. More on that below, because it's the main constraint you design around.

Setup

skunk-crypt is published for Scala 2.13 and 3. Add it (check the Maven Central badge for the current version):

libraryDependencies += "de.thatscalaguy" %% "skunk-crypt" % "1.0.0"
Enter fullscreen mode Exit fullscreen mode

Skunk, Cats, and Cats Effect are declared provided, so skunk-crypt reuses whatever versions are already on your classpath instead of dragging in its own. Make sure Skunk itself is present:

libraryDependencies += "org.tpolecat" %% "skunk-core" % "1.0.0"
Enter fullscreen mode Exit fullscreen mode

Generate a key

Keys are raw AES keys, hex-encoded. 64 hex characters gives you AES-256 (32 and 48 are accepted too, for AES-128/192 — but there's no reason not to use 256):

openssl rand -hex 32
Enter fullscreen mode Exit fullscreen mode

Treat this string like any other top-tier secret. Keep it out of source control; load it from an environment variable, a secrets manager, or your config of choice. If you lose it, the encrypted columns are gone — that's the point.

Build a validated CryptContext

The key is wrapped in a CryptContext. Construction is validated and returns an Either, so a malformed or wrong-length key fails immediately with a reason instead of detonating later inside a query:

import de.thatscalaguy.skunkcrypt.*

given CryptContext =
  CryptContext
    .keysFromHex(sys.env("DB_ENC_KEY"))
    .fold(reason => sys.error(s"Invalid encryption key: $reason"), identity)
Enter fullscreen mode Exit fullscreen mode

The given matters: every crypt/cryptd codec takes the CryptContext implicitly. Define it once where your wiring lives, and the codecs pick it up everywhere. No effect wrappers, no threading a key parameter through your repository layer.

💡 Fail loud on bad keys. The .fold(sys.error, identity) above turns a bad key into an immediate startup crash. That's deliberate — you want "wrong key" to be a deploy-time failure, not a 2 a.m. decryption exception.

Your first encrypted round-trip

Here's a complete, runnable program. The columns are declared TEXT in Postgres even though age is logically an Int:

CREATE TABLE users (
  email TEXT,
  age   TEXT
)
Enter fullscreen mode Exit fullscreen mode
import cats.effect.*
import skunk.*
import skunk.implicits.*
import org.typelevel.otel4s.trace.Tracer
import org.typelevel.otel4s.metrics.Meter
import de.thatscalaguy.skunkcrypt.*

object Demo extends IOApp.Simple:

  given Tracer[IO] = Tracer.Implicits.noop
  given Meter[IO]  = Meter.Implicits.noop

  // Generate with: openssl rand -hex 32
  given CryptContext =
    CryptContext
      .keysFromHex(sys.env("DB_ENC_KEY"))
      .fold(reason => sys.error(s"Invalid encryption key: $reason"), identity)

  val session: Resource[IO, Session[IO]] =
    Session
      .Builder[IO]
      .withHost("localhost")
      .withPort(5432)
      .withUserAndPassword("postgres", "postgres")
      .withDatabase("postgres")
      .single

  def run: IO[Unit] = session.use: s =>
    for
      _ <- s.execute(
             sql"INSERT INTO users (email, age) VALUES (${cryptd.text}, ${crypt.int4})".command
           )(("alice@example.com", 30))
      // The database now holds ciphertext; we read it back as plain values:
      rows <- s.execute(
                sql"SELECT email, age FROM users".query(cryptd.text ~ crypt.int4)
              )
      _ <- IO.println(rows) // List((alice@example.com, 30))
    yield ()
Enter fullscreen mode Exit fullscreen mode

Run it, then peek at the raw row from psql:

       email                          |        age
--------------------------------------+----------------------
 b3pQ7t1mZ2k9c0Aa.0.9sFh2Kx...==      | 7Yk2p0qL4mZ1aa.0.Qf...
Enter fullscreen mode Exit fullscreen mode

What you're looking at is the on-disk format: base64(iv).keyIndex.base64(ciphertext). The IV and the key index travel with the value, which is what makes per-row IVs and key rotation work without any extra bookkeeping tables. Your application reads alice@example.com and 30; the database only ever held the opaque strings.

The one decision that matters: crypt vs cryptd

skunk-crypt gives you two codec objects with the same set of codecs. Picking between them per column is the main design choice you make, so it's worth understanding precisely.

Object Mode Same input → same ciphertext? Use it for
crypt Non-deterministic No (random IV per write) The safe default — anything you don't search
cryptd Deterministic Yes (synthetic IV) Columns you must match with WHERE x = ?

crypt is the safe default. Every write gets a fresh random IV, so encrypting "alice@example.com" twice produces two completely different ciphertexts. An observer with full read access to the table can't even tell which rows share a value. The cost: you can never query by that column — there's nothing to match against.

cryptd is deterministic on purpose. The same plaintext always encrypts to the same ciphertext, which is exactly what lets you do WHERE email = ?. It achieves this without the classic fixed-IV footgun: instead of reusing one IV everywhere (which would leak the XOR of your plaintexts through GCM's keystream), it derives the IV from the plaintext itself via HMAC — a synthetic IV, the same idea as AES-GCM-SIV. Distinct values still get distinct keystreams; equal values line up so they stay searchable.

The trade-off is unavoidable and you should name it out loud when you choose cryptd: a deterministic column reveals which rows share the same value. An attacker who can read the table can see that 4,000 rows have the same country ciphertext, or run a frequency analysis. That's fine for a high-cardinality identifier like an email you need to look up by; it's a poor choice for a low-cardinality field like gender or subscription_tier.

Rule of thumb: reach for cryptd only on columns you genuinely need to look up by exact value, and only when leaking equality is acceptable. Everything else gets crypt.

A more realistic table

Mixed columns is the normal case — some searchable, most not:

CREATE TABLE customers (
  id          TEXT,   -- encrypted UUID, looked up by id  -> cryptd
  email       TEXT,   -- looked up at login               -> cryptd
  full_name   TEXT,   -- displayed, never queried         -> crypt
  tax_id      TEXT,   -- sensitive, never queried         -> crypt
  balance     TEXT,   -- a BigDecimal, never queried      -> crypt
  created_at  TEXT    -- a timestamptz, never queried     -> crypt
)
Enter fullscreen mode Exit fullscreen mode
val insert =
  sql"""INSERT INTO customers (id, email, full_name, tax_id, balance, created_at)
        VALUES (${cryptd.uuid}, ${cryptd.text}, ${crypt.text},
                ${crypt.text}, ${crypt.numeric}, ${crypt.timestamptz})""".command

// pass the values in the same order:
session.execute(insert)(
  (id, "alice@example.com", "Alice Example", "DE123456789", BigDecimal("42.00"), createdAt)
)
Enter fullscreen mode Exit fullscreen mode

Notice the codecs read just like Skunk's built-in ones — cryptd.uuid, crypt.numeric, crypt.timestamptz. You combine them with ~ (or map to a case class) exactly as you always would. The encryption is invisible at the call site.

Searching encrypted columns

Because email was written with cryptd, the encrypted form of a given address is stable, so you can match it directly. The parameter is encrypted deterministically and compared against the stored ciphertext — Postgres does an ordinary string equality:

val byEmail: Query[String, BigDecimal] =
  sql"SELECT balance FROM customers WHERE email = ${cryptd.text}".query(crypt.numeric)

// you pass the plaintext; skunk-crypt encrypts it to the same ciphertext that's stored
session.execute(byEmail)("alice@example.com")
Enter fullscreen mode Exit fullscreen mode

This is the payoff of deterministic mode, and it composes with a real performance win: the deterministic ciphertext is stored as exact TEXT, so an ordinary B-tree index on the column makes the lookup fast.

CREATE INDEX customers_email_idx ON customers (email);
Enter fullscreen mode Exit fullscreen mode

The index is over ciphertext, but since equal plaintext means equal ciphertext, an equality probe lands right on it. (Indexing a crypt column is pointless — every row's ciphertext differs, so nothing ever matches.)

What you cannot do on any encrypted column:

  • LIKE / prefix / substring search — the ciphertext bears no resemblance to the plaintext.
  • Range queries (age > 18, created_at BETWEEN …) or ORDER BY — ordering is destroyed by encryption.
  • Server-side arithmetic, SUM, GROUP BY on the value, etc.

If you need those, the column probably can't be encrypted at rest this way — or you keep a separate, coarse, non-sensitive column to filter on (e.g. an unencrypted age_band) and encrypt the precise value.

Supported types

Both crypt and cryptd expose the same codec set, so the search/no-search choice is orthogonal to the column's type:

Codec Scala type
text String
int2 Short
int4 Int
int8 Long
float4 Float
float8 Double
bool Boolean
uuid java.util.UUID
numeric BigDecimal
date java.time.LocalDate
timestamp java.time.LocalDateTime
timestamptz java.time.OffsetDateTime

Key rotation

Security policies (and auditors) eventually ask you to rotate keys. skunk-crypt builds this in: keysFromHex accepts several keys. Encryption always uses the last one, and the index of the key used is embedded in each stored value — so older keys keep decrypting the rows they originally encrypted.

// new key encrypts; both keys still decrypt
given CryptContext =
  CryptContext.keysFromHex(oldKeyHex, newKeyHex).fold(sys.error, identity)
Enter fullscreen mode Exit fullscreen mode

The mechanics:

  1. Append the new key. New writes use it (key index 1); existing rows still carry index 0 and decrypt with the old key.
  2. Over time — lazily, or in a batch job — re-encrypt old rows by reading and writing them back. Each rewrite re-encrypts with the newest key.
  3. Once every row is on the new key, you could drop the old one.

⚠️ Only ever append keys. Never reorder or remove them. The key index is stored inside each row. Reorder the list and every existing row now points at the wrong key; remove a key still in use and those rows become undecryptable. Append-only is the whole contract.

The rotation gotcha for cryptd columns

Here's the sharp edge that the happy-path docs gloss over, and it's worth stopping on. A cryptd column's ciphertext depends on the encryption key, because the synthetic IV is derived from the plaintext using the last key. So the moment you append a rotation key:

  • Old rows still decrypt fine (their key index is embedded). Reading is unaffected.
  • But a fresh WHERE email = ${cryptd.text} lookup now encrypts the parameter with the new key — producing a different ciphertext than the one stored under the old key. The equality match silently misses the old rows.

In other words: rotating a key doesn't just change new writes, it temporarily breaks search over deterministic columns that haven't been re-encrypted yet. The fix is to re-encrypt those columns promptly after rotation rather than lazily:

// Re-encrypt every customer's deterministic columns onto the newest key.
// Reading decrypts with whatever key the row used; writing re-encrypts with the latest.
val reencrypt: IO[Unit] =
  session.use: s =>
    for
      rows <- s.execute(sql"SELECT id, email FROM customers".query(cryptd.uuid ~ cryptd.text))
      _    <- rows.traverse_ { case id ~ email =>
                s.execute(
                  sql"UPDATE customers SET email = ${cryptd.text} WHERE id = ${cryptd.uuid}".command
                )((email, id))
              }
    yield ()
Enter fullscreen mode Exit fullscreen mode

For crypt (non-deterministic) columns this never bites, because you never search them — lazy re-encryption is perfectly fine there. It's specifically the searchable columns that need an eager rewrite.

Error handling

Two failure modes are typed, and both surface through Skunk's normal codec path.

Bad key — caught at construction. keysFromHex returns Either[String, CryptContext], with the reason (and the offending key's index, for multi-key lists). A bad key can't reach a query.

Bad ciphertext — caught at decode. Decryption raises a CryptError (a RuntimeException subtype, so it propagates like any decode failure):

Error Meaning
MalformedCiphertext The stored value isn't iv.keyIndex.data — e.g. legacy plaintext
DecryptionFailure Wrong key, unknown key index, or a failed GCM authentication tag

MalformedCiphertext is the one you'll meet first in real life: it's what you get when an existing, unencrypted column still holds plaintext and you point an encrypted codec at it. That's your signal to run a one-time migration — read each row with the plain Skunk codec, write it back with the encrypted codec — before switching reads over.

A DecryptionFailure from a modified value is a feature, not a bug: AES-GCM is authenticated, so a tampered or truncated ciphertext fails to decrypt rather than quietly returning garbage. If someone edits a byte in the database, you find out.

What it protects — and what it doesn't

Be honest with your threat model. skunk-crypt encrypts column values. It is strong against the specific scenario of someone reading your database — a leaked backup, a stolen replica, an over-privileged analyst, a SELECT-only breach. They get ciphertext and, without your key, nothing else.

It does not:

  • Hide column names, row counts, or access patterns.
  • Protect data in your application's memory (it's plaintext there by design).
  • Replace transport security (use TLS to Postgres) or at-rest disk encryption — those defend different layers.
  • Conceal equality on cryptd columns (covered above).

It's one focused layer: confidentiality of sensitive values against database-level exposure. Stack it with the others; don't expect it to be all of them.

Testing your integration

Codec round-trips are pure, so you can unit-test encrypt-then-decrypt without a database at all. For the real thing — values actually flowing through Postgres — skunk-crypt's own suite uses Testcontainers to spin up a throwaway postgres:16, which is a good pattern to copy:

test("round-trips text and int columns through Postgres") {
  withContainers { case db: GenericContainer =>
    session(db.container.getMappedPort(5432)).use { s =>
      for
        _    <- s.execute(
                  sql"INSERT INTO test (string, numbers) VALUES (${cryptd.text}, ${cryptd.int4})".command
                )(("Hello", 123))
        rows <- s.execute(
                  sql"SELECT string, numbers FROM test".query(cryptd.text ~ cryptd.int4)
                )
      yield assertEquals(rows, List("Hello" -> 123))
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The thing worth asserting beyond a plain round-trip: that the stored form really is ciphertext (read the column with a plain text codec and check it doesn't equal the plaintext), and that a cryptd write is stable across runs while a crypt write isn't.

A checklist before you ship

  • [ ] Encryption key loaded from a secret, never committed. Startup fails loudly on a bad key.
  • [ ] Every sensitive column is TEXT in the schema.
  • [ ] cryptd only where you must search by exact value and leaking equality is acceptable; crypt everywhere else.
  • [ ] B-tree index on each cryptd column you look up by.
  • [ ] A migration plan for any column that currently holds plaintext (watch for MalformedCiphertext).
  • [ ] A rotation runbook that appends keys and eagerly re-encrypts cryptd columns.
  • [ ] You've written down what this does and doesn't protect, so nobody assumes it's more than column-value confidentiality.

Wrapping up

The thing I like about skunk-crypt is that it refuses to be a framework. It's a handful of codecs and one implicit context. You don't restructure your code around it — you swap text for crypt.text, decide per column whether you need to search it, and carry on writing ordinary Skunk. The hard parts (authenticated encryption, per-row IVs, embedded key indices for rotation) are handled; the parts you have to think about (deterministic-equality leakage, rotation + re-encryption, threat model) are the parts no library can decide for you anyway.

Source, issues, and the full API reference:

👉 github.com/ThatScalaGuy/skunk-crypt · Scaladoc

It's Apache-2.0 licensed and contributions are welcome. If you put it into production, I'd genuinely like to hear how it goes — and if your project needs a hand with Scala, data, or the boring-but-critical security plumbing, get in touch.

Top comments (0)