DEV Community: Vladyslav Len

SQLNet is A social network that looks like Twitter, but you write SQL to do anything

Vladyslav Len — Sat, 03 Jan 2026 15:32:20 +0000

Every time you "like" something on any platform, a database somewhere executes:

INSERT INTO likes (user_id, post_id) VALUES (you, that_thing_you_liked);

We wrap it in a heart icon. Make it feel human. Emotional. Meaningful.

It's not. It's a row in a table.

So I built a social network that stops pretending. On SQLNet.cc, you want to post something? You have to type:

INSERT INTO posts (author_id, content)
VALUES (me(), 'I just got fired. Here is what it taught me about B2B sales.');

Hit enter. It posts. That's the interface.

Why build this?

The idea is simple: what if the interface was the query?

Social platforms have spent years abstracting away what's actually happening. Tap a heart, swipe right, scroll infinitely. Underneath, it's all just database operations dressed up in gradients and dopamine loops.

SQLNet works differently. You see the query. You write the query. You understand exactly what you're asking the system to do.

This isn't user-hostile design for its own sake. It's honesty about what social media actually is, and in that honesty, something interesting happens. More on that later.

The architecture of collective delusion

The core problem: give every user their own database while making it feel like one shared social network.

I call it the Three Database Model.

Database 1: System DB (The Bouncer)

Accounts. Passwords. Tenant mappings. The backend queries it, users never see it. Very professional. Very boring. Moving on.

Database 2: Primary DB (The One True Reality)

This is where truth lives. Every post, like, comment, and follow that actually happened. Canonical schema (plus some internal sync fields I'm omitting for sanity):

posts(id, author_id, content, created_at, updated_at)
likes(id, post_id, user_id, created_at)
comments(id, post_id, user_id, content, created_at, updated_at)
follows(follower_id, following_id, created_at)

Database 3: Tenant DBs (The Beautiful Lies)

When you register, I provision an entire SQLite database just for you using Turso. Not a schema. Not a namespace. A whole database with its own schema, seeded with everyone else's data.

And YOUR database has extra columns that the primary doesn't:

posts(id, author_id, content, created_at, updated_at, like_count, comment_count)

Why? So you can write ORDER BY like_count DESC without a JOIN. Convenience! User experience! Things I care about, apparently!

The twist: You query YOUR database directly. But your database is a carefully maintained illusion synchronized with everyone else's reality.

You're in the Matrix, but the Matrix runs on SQL.

The sync engine: an undocumented pragma and a Discord help

I needed Change Data Capture—a way to know every INSERT, UPDATE, DELETE happening across potentially thousands of databases. Without polling like a caveman.

The problem: Turso's CDC is not available on the Cloud. I didn't know that. I was originally planning to create branches of the primary instance for each tenant and add some extra columns, tables for convenience.

How on earth is this supposed to work?

The solution: I messaged the Turso team on Discord. They pointed me in the right direction.

I can use local SQLite files instead of Cloud Turso, since CDC is not supported.

PRAGMA unstable_capture_data_changes_conn('full');

See that unstable_ prefix? That's not branding. It's a warning. This feature is:

Not fully supported
Might break tomorrow
Exactly what I needed

When you run this pragma, Turso creates a table called turso_cdc that captures every mutation. You read it like this:

SELECT
    table_name,
    change_type,        -- 1=INSERT, 0=UPDATE, -1=DELETE
    id,
    bin_record_json_object(table_columns_json_array(table_name), before) as before_json,
    bin_record_json_object(table_columns_json_array(table_name), after) as after_json
FROM turso_cdc
ORDER BY change_id ASC

Those functions—bin_record_json_object and table_columns_json_array—are Turso internals that convert binary records into JSON. I don't fully understand how they work. I just know they do.

THE CATCH THAT ALMOST KILLED ME:

This pragma must execute on every. single. connection.

Not per database. Per connection. Miss it once, and that connection is blind to changes. I spent four hours debugging why sync "randomly" stopped working before realizing one connection pool wasn't running the pragma.

I harassed the Turso team on Discord to get some help with that, and they told me to run pragma PRAGMA unstable_capture_data_changes_conn('full'); on every established connection.

db, err := sql.Open("turso", dbPath)
if err != nil {
    return nil, err
}

// Forget this line and enjoy debugging for 4 hours
_, err = db.Exec("PRAGMA unstable_capture_data_changes_conn('full');")

Bidirectional sync: beautiful in theory, terrifying at scale

Two directions. Two kinds of problems.

UPSTREAM: You → Primary (Publishing Your Thoughts to the Void)

You run an INSERT. NATS fires a trigger. The sync worker catches it up, and:

Reads your turso_cdc table for changes
Filters out changes that came FROM the primary (loop prevention)
Strips non-canonical fields (nice try, like_count)
Writes to primary database
Recalculates post_stats for affected posts
Clears your CDC table

Your Database                           Primary Database
┌──────────────────────────┐            ┌──────────────────────────┐
│ turso_cdc                │            │                          │
│ ┌──────────────────────┐ │            │                          │
│ │ INSERT INTO posts    │ │  extract   │                          │
│ │ content: "hello"     │─┼─ canonical─▶  INSERT INTO posts       │
│ │ like_count: 0        │ │  fields    │  (without like_count)    │
│ │ (you tried)          │ │  (lol no)  │                          │
│ └──────────────────────┘ │            │  → recalc post_stats     │
└──────────────────────────┘            └──────────────────────────┘

DOWNSTREAM: Primary → Everyone (Your Reality Check)

A background goroutine on an interval:

Reads primary's turso_cdc table
Figures out who originated each change
For EVERY OTHER TENANT: opens connection, executes INSERT OR REPLACE
Clears primary's CDC table
Repeats until heat death of universe (or server restart)

Primary Database                         Every Single Tenant
┌──────────────────────────┐            ┌─────────────────────────┐
│ turso_cdc                │            │ Tenant A                │
│ ┌──────────────────────┐ │   write    │ INSERT OR REPLACE       │
│ │ New post appeared    │─┼── to ────▶ ├─────────────────────────┤
│ └──────────────────────┘ │   each     │ Tenant B                │
│                          │   one      │ INSERT OR REPLACE       │
│                          │   (yes     ├─────────────────────────┤
│                          │   really)  │ Tenant C                │
│                          │            │ INSERT OR REPLACE       │
│                          │            ├─────────────────────────┤
│                          │            │ ... Tenant N            │
│                          │            │ You get the idea        │
└──────────────────────────┘            └─────────────────────────┘

Loop prevention happens via a marker field. Downstream sync stamps every write with _sync_origin = "primary". Upstream sync checks for this marker and skips anything that came from primary:

// Downstream marks its writes
newDataCopy["_sync_origin"] = "primary"

// Upstream checks: did this change come from primary?
func isReplicatedChange(change CDCChange) bool {
    if change.Operation == "INSERT" {
        if origin, ok := change.NewData["_sync_origin"].(string); ok {
            return origin == "primary"
        }
    }
    // UPDATE: was null before, is "primary" now? That's a sync write.
    if change.Operation == "UPDATE" {
        oldOrigin, _ := change.OldData["_sync_origin"].(string)
        newOrigin, _ := change.NewData["_sync_origin"].(string)
        return oldOrigin == "" && newOrigin == "primary"
    }
    return false
}

It's more sophisticated than I originally planned. Still works. Still not apologizing.

This will absolutely not scale, and I'm telling you now

Let's do napkin math together. It'll be fun. (It won't be fun.)

For every change in the primary database, downstream sync:

Lists all tenant databases (filesystem read)
Looks up originator (system DB query)
For each of N tenants: opens connection, writes data

That's O(changes × tenants) complexity.

Let's plug in numbers:

1,000 users
100 changes per sync interval
= 100,000 database writes per interval

The sync interval is a few seconds.

for _, change := range changes {
    for _, tenantName := range tenants {  // N iterations, baby
        if tenantName == originatorTenant {
            continue
        }
        d.replicateToTenant(ctx, tenantName, change)  // Disk go brrr
    }
}

No parallelization. No batching. No sharding. Two nested for-loops.

At 10,000 users: Slow.
At 100,000 users: Unusable.
At 1,000,000 users: Physically impossible (Didn't check this, just guessing at this point).

But if this ever gets popular enough to break, that's a problem worth having. Optimization can wait for users.

The vandalism feature (yes, it's a feature now)

You know what's fun? This works:

UPDATE posts SET like_count = 999999 WHERE id = 'some-post';

Your database accepts it. You see 999999 likes. You feel powerful. You feel like a god.

For about thirty seconds.

Then downstream sync runs, reads the real post_stats from primary, and overwrites your delusions with cold, hard, normalized reality.

Why this works mechanically:

The upstream filter:

func (r *Replicator) isCanonicalField(field string) bool {
    excludedFields := map[string]bool{
        "like_count":    true,
        "comment_count": true,
        "_sync_origin":  true,
    }
    return !excludedFields[field]
}

Your like_count = 999999 never leaves your database. It's stripped on the upstream. Then downstream corrects you on the next cycle.

It's digital graffiti that cleans itself up.

I call it eventual consistency with a side of humiliation.

The terminal aesthetic

sqlnet> SELECT p.content, u.username, p.like_count
            FROM posts p
            JOIN users u ON p.author_id = u.id
            ORDER BY p.created_at DESC LIMIT 5;
┌─────────────────────────────────┬──────────┬────────────┐
│ content                         │ username │ like_count │
├─────────────────────────────────┼──────────┼────────────┤
│ Just built SQL social network   │ vladlen  │ 42         │
│ Anyone else debug at 3am        │ devghost │ 18         │
│ SELECT * FROM motivation...     │ burnout  │ 7          │
└─────────────────────────────────┴──────────┴────────────┘
3 rows returned (23ms)

Social media optimizes for engagement. Tap to post. Swipe to scroll. Algorithmic feeds designed to keep you scrolling.

SQLNet is the opposite. You type a query. You get a result. Then silence until you ask for more.

The friction is the point. You have to think before you post because you have to write the query first.

The `me()` function: three characters that make it personal

My favorite implementation detail. In any query:

SELECT * FROM posts WHERE author_id = me();

me() returns your user ID. Backend intercepts your SQL, injects the value from your JWT, executes. You never memorize a UUID.

Three characters. But they make the whole system feel yours.

You're not querying some abstract database. You're querying YOUR world. And me() is always there, a tiny reminder that this space belongs to you.

Local tables: your private chaos

Not everything syncs. Some tables exist only in YOUR database:

-- Your drafts. Yours alone.
CREATE TABLE drafts (
    id TEXT PRIMARY KEY,
    content TEXT,
    created_at DATETIME
);

-- Your saved queries. Your algorithms.
CREATE TABLE saved_queries (
    id TEXT PRIMARY KEY,
    name TEXT,
    query TEXT
);

Upstream sync ignores these tables entirely. Your drafts are yours. Your queries are yours.

You're curating your own feed algorithm:

INSERT INTO saved_queries (id, name, query) VALUES (
    uuid(),
    'Hot takes this week',
    'SELECT * FROM posts WHERE created_at > date(''now'', ''-7 days'') ORDER BY like_count DESC'
);

Take that, recommendation systems.

What I learned building this

1. Abstraction has costs.

Every time we hide complexity, we hide control. Sometimes that's good. Sometimes you're building a skinner box with pretty CSS and calling it "user experience."

2. CDC is magic until you need it to work locally.

Massive shoutout to the Turso team for answering Discord messages during holidays. Open source communities are incredible. Go buy them coffee.

3. Building weird things is its own reward.

Not everything needs market fit. Sometimes you build something just to see if you can.

The stack (for those who care)

Backend: Go. Fast and boring. Perfect.
HTTP: Echo framework
Auth: JWT, nothing fancy
Databases: Turso (SQLite with superpowers)
CDC: PRAGMA unstable_capture_data_changes_conn('full') + turso_cdc table
Messaging: NATS for sync triggers
DI: Uber's fx

Who is this for?

People who:

Read database documentation for fun
Have looked at an algorithmic feed and thought "I could do this better with a WHERE clause"

If that's you: welcome. Your database is waiting.

Twitter: https://x.com/lenvladyslav
Project Website: https://sqlnet.cc/

Your Project Shouldn’t Break When You Switch Branches. So I Fixed It

Vladyslav Len — Wed, 10 Dec 2025 20:23:23 +0000

Look, I’m gonna be honest with you. I was supposed to be working on my startup. I had a 9–5. Life was busy. But then this one thing kept annoying me SO much that I had to stop everything and fix it.

You know that moment when you’re deep in a feature branch, you’ve run your migrations, everything’s working perfectly, and then you need to switch back to main for a quick hotfix?

And then your app just… explodes.

Your database is completely out of sync. The schema doesn’t match the code. ActiveRecord is screaming at you. Rails can’t find the user_preferences table that doesn’t exist yet on main. Nothing works.

We’ve all been there. And the “solutions” are genuinely terrible:

Drop the database and re-seed; Cool, let me just wait 5 minutes while I lose all my test data
Manually roll back migrations; Hope you remember exactly which ones to undo and in what order
Maintain multiple databases; And constantly remember to switch your connection string. No comments here
None of these are good. And I was tired of it.

The “Wait, PostgreSQL Can Do WHAT?” Moment

So I’m procrastinating one evening (typical dev chores, while Claude is running), reading PostgreSQL docs instead of working on my actual startup, and I stumble upon this:

CREATE DATABASE new_db TEMPLATE source_db

Joking, I recall we used this at work for tests. But the procrastination piece sounds better imo.

Template databases. PostgreSQL can create a database from another database as a template. And here’s the wild part — it’s a file-level copy. It’s not doing some expensive pg_dump and restore. It’s just copying the data files.

It’s fast. Like, really fast.

And my brain immediately went: “Wait. What if I could just… snapshot my database on each git branch? And switch between them instantly?”

After 1 minute of googling for existing solutions, I decided I needed to build this.

Building the Thing

The core idea is stupidly simple. When you run pgbranch branch main, it creates a database called myapp_dev_pgbranch_main using your working database as a template. That’s your snapshot.

When you run pgbranch checkout main, it drops your working database and recreates it from the snapshot.

No pg_dump. No restore. No waiting.

pgbranch init -d myapp_dev
pgbranch branch main           # snapshot your clean main state
... switch to feature branch, run migrations, break things ...
pgbranch branch feature-x      # save this state too
pgbranch checkout main         # instantly back to clean state

I wrote the whole thing in Go because, honestly, CLI tools in Go just feel right. Single binary, cross-platform, fast startup time. No “installing dependencies” nonsense. (I just don’t know Rust)

The architecture is pretty clean. There’s a Brancher core that handles the business logic, a postgres.Client that talks to PostgreSQL using pgx (the best Go postgres driver, fight me), and a CLI layer using Cobra.

The Gotcha That Almost Broke Me

Here’s something I learned the hard way: you can’t create a database from a template if there are active connections to the source database.

PostgreSQL will just say “nope, database is being accessed by other users” and refuse to cooperate.

So every operation that touches a database needs to first terminate all connections:

func (c *Client) TerminateConnectionsTo(dbName string) error {
    _, _ = conn.Exec(ctx, `
        SELECT pg_terminate_backend(pid)
        FROM pg_stat_activity
        WHERE datname = $1 AND pid <> pg_backend_pid()
    `, dbName)
    return nil
}

It’s one of those things that seems obvious in retrospect but had me debugging for way too long.

Scope Creep (But Like, The Good Kind)

Okay, so I had the basic branching working. Cool. Ship it. Done.

But then a guy on Reddit asked… what if it could automatically switch database branches when you switch git branches?

Git has these things called hooks. Specifically, there’s a post-checkout hook that runs after every git checkout. So I wrote a little shell script that:

Checks if you’re in a pgbranch-initialized directory
Gets the current git branch name
Checks if a pgbranch branch with that name exists If yes, checks it out. If no, creates it.

pgbranch hook install
git checkout feature-x  # automatically switches database branch too

Now your database just… follows your git branches. Automatically. No thinking required.

But Wait, There’s More
Then I thought about teams. What if you want to share a database snapshot with a colleague? Or spin up a known-good database state in CI?

So I added remotes. Like git remotes, but for database snapshots.

pgbranch remote add origin s3://my-bucket/pgbranch
pgbranch push main --description "Clean schema with seed data"
# On another machine:
pgbranch pull main

It supports S3, Cloudflare R2 (which is S3-compatible so that was easy), and plain filesystem paths for network drives.

The archive format is a gzipped tar containing a manifest (JSON with metadata, checksums, pg_dump version) and the actual pg_dump output in custom format. Checksums verify integrity because nobody wants to restore a corrupted database.

And yes, credentials are encrypted before being stored locally. Because storing AWS keys in plaintext in a JSON file would not be ok.

The Part Where I Get Sentimental

Here’s the thing. I wasn’t building this to get famous or make money. I was building it because the problem annoyed me, and solving problems is fun.

I pushed it to GitHub, posted it on Reddit, and went back to my startup stuff.

Then the stars started coming in.

33 stars in the first week. From real developers. People I don’t know. People who apparently had the same pain point and were excited that someone finally solved it.

And look, 33 stars isn’t going viral. It’s not hitting the front page of Hacker News. But you know what it is?

It’s 33 developers who found something I built useful.

That hit different.

Working on a startup is great. Getting paid at my 9–5 is great. But there’s something special about building something in your spare time, giving it away for free, and watching people actually use it.

It reminded me why I got into programming in the first place. Not for the career. Not for the money. But because building things is genuinely, deeply satisfying.

The Takeaway

Sometimes the best projects come from scratching your own itch. You don’t need a business plan. You don’t need to validate the market. You just need a problem that annoys you enough to fix it.

And if you’re working on migrations across git branches and dealing with the database sync nightmare… maybe give pgbranch a shot.

go install github.com/le-vlad/pgbranch/cmd/pgbranch@latest

It’s free. It’s open source. And if it saves you even one “drop database, re-seed, wait 5 minutes” cycle, I’ll consider it a win.

Now if you’ll excuse me, I have a startup to get back to.
pgbranch: https://github.com/le-vlad/pgbranch

Okay, it's time to change your cron job

Vladyslav Len — Mon, 05 May 2025 15:55:35 +0000

I'll start this post by saying that it's been a while since we got the first version of the CRON, which became de facto a default task scheduling tool for developers. Even more, cron jobs are older than me and I'm not that young.

When I first got into software development, we used to deploy our code on EC2 instances and have a minimal continuous delivery setup realized via webhooks that triggered the git pull command and restart nginx, but we also had a bunch of recurring tasks That had to be invoked at midnight (classic example). Some of them had to run every couple of minutes.

I remember learning the cron syntax at that time, it felt almost like RegExp, but surely, it was 1000 times easier, however, this is not what I want to talk about.

Cron jobs are not evolving

And they should not, but the way we approach them should. There is nothing wrong with the CRON itself as a tool. It is as it is, and has been helping developers for ages now. The problem is not the tool, the problem is the way we use it.

Over time, a lot of tools that I used for software development have been changed, and updated; some of them died, RIP Netbeans IDE.
These updates always bring something to your routine. For example, Docker helps deploy your code without forgetting to install that tiny package from 2005 with a fixed version to keep legacy PHP projects running. Or NodeJS lets you believe in the fairy tale that your JS code will run fine on the server (spoiler alert - it wouldn't). The only thing that remains the same on every single project is CRON..

At some point, almost every (I think every, just can't recall all tasks) project I worked on had to have a certain amount of scheduled tasks to be running. I can't really explain why, but I always tried to avoid cron jobs at all costs, it felt like you took a wrong turn, almost like your architecture is incorrect and now you try to patch it with some tasks that will be fixing your mistakes every couple of minutes. Even though it's not really like this and cron jobs are powerful tools to solve various tasks, sometimes I still feel this way.

Trying to explain this feeling I came up with one answer. When you write your backend code, you, as a developer, have a deep understanding of the context of your program (I hope so), its runtime, a ton of dependencies, and how they are injected using that fancy DI lib. But what about CRON Jobs? Do they exist inside the context of your app? The answer is no.

The reality

CRON jobs are defined in crontab files in Linux and triggered by a cron scheduler when needed unless you use any other planner; due to its nature - they simply trigger a script.
If you ask yourself what's wrong with that, you can probably say "nothing", and I'd agree with you, but.

The reality of modern software development is different, now people tend to scale more horizontally than vertically, even though vertical scaling is way more affordable and in most cases (especially in early stages) easier and in my opinion favorable. We might consider this horizontal scaling a premature optimization, though I'm not entirely convinced that's accurate. Sometimes it's not only about being ready to scale your app for hundreds of thousands of users, it's about the way we deploy our projects nowadays.

I think this is dictated by the fact that user demand for availability and resilience is way higher than it used to be, and of course some belief in your next idea; so what do you do? Correct, you do to cloud provider and enable blue-green deployments and a minimum number of instances, or you go even further and enable cross-regional deployment to place your app closer to the customer (No one cares that the DB is a single node in us-east-2, but the app instance is in Australia.)

And now it's time to add a few scheduled tasks. The problem here is: Docker. You heard me right, and don't get it wrong, I love Docker, but you can't just place your crontab file in a docker image and call it a day. This will lead to your app having two simultaneous executions of the cron jobs. I suspect many developers face this same challenge. Let's see what cloud providers offer us. Or even better, instead of researching on our own, let's follow the modern approach to building projects and consult ChatGPT/Claude or other LLMs.

Here are a couple of suggestions:

Fly.io > Fly.io provides a mechanism for scheduled tasks through their "Fly Machines" functionality. You can deploy a separate machine dedicated to running your scheduled jobs.
Render > Render offers built-in cron job support through their "Cron Jobs" feature, which allows you to set up scheduled tasks directly in your Render dashboard. These jobs run on separate infrastructure from your web services.
AWS > AWS EventBridge (formerly CloudWatch Events) allows you to create rules that run on schedules AWS Lambda can be triggered on a schedule AWS Batch for more resource-intensive scheduled jobs

I'm not going to start talking about k8s jobs that must spin up a container each time when they invoke, spending a ton of time on that, or the fact that you have to pay 5 USD/month for each cron job using Render.com to run a script inside a docker container.

Even with all these solutions, you still have to actually build something that will be running outside of the scope of your project and will call an endpoint on your backend or push a message into a queue and allow only one consumer to read from that. It becomes worse when you realize that you need to monitor executions, or react to failed executions, and prevent them from overlapping for long-running tasks.

What I'm trying to say here, is that the modern way of software engineering is already quite complex and broad, and having a need to deploy and maintain one more system to invoke a couple of functions in your backend is kind of absurd in my opinion.

There are some tools and lib that try to solve this, like NodeJS Bull lib that uses Redis to act like an orchestrator for task executions, etc. But do you really want this?

Solution?

I guess at this stage you have a right to say that there is a lot of critique and no solutions offered, so let's talk about it.

All I can say is that I believe scheduled tasks should exist and be executed from within your code, while the orchestration must be done by an external system. They should be scoped by type or name and be able to prevent overlapping. As a developer, you shouldn't need to worry about task synchronization or building solutions to monitor them properly.

The gap between modern application architecture and outdated scheduling tools presented an opportunity to create something better. And if that feels like it's approaching some advertisement section in the YouTube video from your favorite creator - you are not that far away from the truth.
This article is a reflection of my thoughts that resulted in a project called schedo.dev that aims to solve these problems.

It gives developers a way to describe functions in the runtime that will execute the code. This prevents accidents from happening, such as having two prod environments running simultaneously and processing duplicate money withdrawals (a real story I heard).

When building it, we went through different solutions, so you don't have to. Schedo will do synchronization and will deliver the job to available consumers once the job is ready to be executed.

Compared to standard cron jobs, you can trigger them immediately when needed, monitor execution times, and read the logs. But even more important - you don't build a thing yourself. Cron jobs must be easy, and you need to be thinking about what's the actual thing happening inside, not about the way how to trigger it or make sure it's not overlapping.

Define a job and run your code, this is how it's supposed to be.

schedo.defineJob(
  'send-weekly-report',   // Identifier
  '0 9 * * 1',            // Schedule (every Monday at 9 AM)
  async (ctx) => {        // Handler
    await sendReport(ctx.userId);
    return 'Report sent';
  }
);

This article is already long enough, but I'd like to emphasize a few points about the way it works. If you already gave up on this semi-promotional article, I don't blame you. But if you are still here - let's dive in.

Job definition, the snippet above is an example of the job being defined using Schedo.dev SDK. Whenever your app starts it connects to the remote server and checks if this job exists for the environment and matches the name & schedule. Job is defined once, no matter how many instances of the app you have.
Job scheduler. After the job is defined in Schedo, it's registered in our cron scheduler which takes care of the invocation when needed. So, your job is stored and scheduled on Schedo's side.
Job execution. Since your app is connected to Schedo's API, when the time comes - there is a signal sent to one of the connected instances of the app, ensuring there are no simultaneous executions. Once the job execution is picked up - it's locked for that worker.
Timeouts, you can define the job with two different types of timeouts. Pickup timeout - the worker must pick up the job within a period of time defined before the job becomes expired. Execution timeout - simply the time given for a job to execute.
Blocking jobs. By default, Schedo tries to behave as standard crontab, not preventing jobs from overlapping. But you can and, in a lot of cases should, define the job as blocking if you want the next execution to be skipped if the previous one is still running. The job becomes skipped in this case

Some person asked my friend who developed this project with me, "What did happen to cron jobs over the last couple of years that we decided this project must exist?" That spot question didn't get the response we wanted at that moment, but now I'd say "Nothing. And that's exactly why we believe this must exist".

PostgreSQL to NATS Streaming

Vladyslav Len — Wed, 22 May 2024 20:57:15 +0000

Introduction

As we all know - Postgres is eating the world of databases. It stands out like a Swiss army knife from databases. So, more and more developers adopt PostgreSQL in their projects to store the data.
But as this always happens as projects grow - the need to stream the changes from the database to other services arises. This is where DataBrew Cloud and Open Source Blink come in.

Why would you do that?

Streaming data is not a silver bullet, but it still has a lot of use cases. Here are some of them:

Building event-driven architecture
Real-time analytics
Sharing data with external systems

What are the benefits?

Data streaming from Postgres, also called CDC (Change-Data-Capture) is a process of reading changes from a WAL file directly, instead of querying your data which may cause a significant load on the database.

It also allows you to be sure your consumer may be offline for a while and still get all the changes when they come back online.

Requirements

Postgres setup

First, let's ensure you have your database ready for CDC.
Let's check your WAL_LEVEL:

SHOW wal_level;

If the result is not logical you should change it to logical:

WAL_LEVEL param represents the way your database will work with WAL.
We want to have it set to logical as it makes the database write changes to a WAL file in a way that we can read it later.

NATS setup

Make sure you have nats.io server running. You can use the official docker image:

docker run -p 4222:4222 -ti nats:latest

You should see logs like this:

[1] 2019/05/24 15:42:58.228063 [INF] Starting nats-server version #.#.#
[1] 2019/05/24 15:42:58.228115 [INF] Git commit [#######]
[1] 2019/05/24 15:42:58.228201 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2019/05/24 15:42:58.228740 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2019/05/24 15:42:58.228765 [INF] Server is ready
[1] 2019/05/24 15:42:58.229003 [INF] Listening for route connections on 0.0.0.0:6222

If you are going to use DataBrew Cloud - you must ensure your Postgres and NATS are accessible from the internet. You can use services like ngrok to expose your local services to the internet. Or you can deploy them in the cloud.

Start with DataBrew Cloud

First, you need to create a new account in DataBrew Cloud or log into an existing one.

Then you need to create a new pipeline. You can do this by clicking on the "New Pipeline" button in the top right corner.

Add Postgres source

First, we must configure our PostgreSQL database as a source for the pipeline. Click on the "Add Connector" button and select "Postgres-CDC" from the list.

Then you need to fill in the connection details for your Postgres database. You need to provide the following information:

When you fill out all the info - Press "Check Connection" to ensure the connection is working.

You will later be asked to provide the table you want to stream the changes from. Simply select the one needed to proceed.

Add NATS sink

To create a full pipeline you need to add a sink for the data. In our case, it will be NATS.

Click on the "Add Connector" button and select "NATS" from the list.
The flow is relatively the same as with Postgres-CDC connector. You need to provide the connection details for your NATS server.

Provide the connection details and press "Check Connection" to ensure the connection is working.

Creating the pipeline

Once you have both connectors configured, you can press the "Create Pipeline" button to create the pipeline.

Select the previously created Postgres-CDC Connection as a source and NATS connector as a destination. It our case the connection name is "Taxi rides", as we are going to stream the changes from the "taxi_rides" table.

Now is the time to save and deploy our pipeline. Press the "Save pipeline" button. We are not going to add any processors to our data flow just yet.

After you store the pipeline and press the "Deploy" button - you will see the logs of the pipeline execution.

Please keep in mind that the first pipeline deployment may take a few seconds.

Within a few seconds, you will see the logs from the pipeline execution. If everything is correct - you will see logs like this:

2024-05-22 22:11:59 INFO Metrics: Component has been loaded
2024-05-22 22:11:59 INFO Source: Loaded driver=postgres_cdc
2024-05-22 22:11:59 INFO Sinks: Loaded driver=nats
2024/05/22 22:11:59 INFO [source]: PostgreSQL-CDC: Create publication for table schemas with query CREATE PUBLICATION pglog_stream_rs_databrew_replication_slot_174_2231 FOR TABLE public.taxi_rides;
2024/05/22 22:11:59 INFO [source]: PostgreSQL-CDC: Created Postgresql publication publication_name=rs_databrew_replication_slot_174_2231
2024/05/22 22:11:59 INFO [source]: PostgreSQL-CDC: System identification result SystemID:=7293538614695768105 Timeline:=1 XLogPos:=E4/5C009318 DBName:=mocks
BEGIN
0
2024/05/22 22:12:00 INFO [source]: PostgreSQL-CDC: Processing database snapshot schema=public
  table=
  │ {TableName:public.taxi_rides Schema:schema:
  │   fields: 11
  │     - _cq_sync_time: type=utf8, nullable
  │     - distance_traveled: type=float64, nullable
  │     - driver_id: type=int32, nullable
  │     - duration: type=int32, nullable
  │     - end_location: type=utf8, nullable
  │     - fare_amount: type=float64, nullable
  │     - log_id: type=int32
  │     - passenger_id: type=int32, nullable
  │     - payment_method: type=utf8, nullable
  │     - start_location: type=utf8, nullable
  │     - timestamp: type=utf8, nullable}
2024/05/22 22:12:00 INFO [source]: PostgreSQL-CDC: Query snapshot batch-size=13500
2024/05/22 22:12:00 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=0
2024/05/22 22:12:05 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=13500
2024/05/22 22:12:08 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=27000
2024/05/22 22:12:09 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=40500
2024-05-22 22:12:09 INFO Stream: Messages stat messages_received=53649 messages_sent=53649 messages_dropped_or_filtered=0
2024/05/22 22:12:09 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=54000
2024/05/22 22:12:10 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=67500
2024/05/22 22:12:11 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=81000
2024/05/22 22:12:11 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=94500
2024/05/22 22:12:12 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=108000

Start with Open Source Blink

Blink is an Open-Source project from DataBrew that allows you to stream data from various sources to various destinations.

In this section, we will cover how to start with Blink and stream data from Postgres to NATS.

Assuming you already have all Postgres and NATS setup - let's start with Blink.

Download and install Blink

You can read more about the installation here - Installing Blink.

Create a new pipeline

Comparing to the DataBrew Cloud - Blink is a CLI tool. You can create a new pipeline by defining the pipeline configuration in a YAML file.

Here is an example of the pipeline configuration for our particular use case:
Store the file with the name blink.yaml

service:
  pipeline_id: 223
source:
  driver: postgres_cdc
  config:
    host: localhost
    slot_name: slot_example_name
    user: postgres
    password: 12345
    port: 5432
    schema: public
    stream_snapshot: false
    snapshot_memory_safety_factor: 0.1
    snapshot_batch_size: 10000
    ssl_required: true
    database: mocks
  stream_schema:
    - stream: public.taxi_rides
      columns:
        - name: log_id
          databrewType: Int32
          nativeConnectorType: integer
          pk: true
          nullable: false
        - name: _cq_sync_time
          databrewType: String
          nativeConnectorType: timestamp without time zone
          pk: false
          nullable: true
        - name: distance_traveled
          databrewType: Float64
          nativeConnectorType: double precision
          pk: false
          nullable: true
        - name: driver_id
          databrewType: Int32
          nativeConnectorType: integer
          pk: false
          nullable: true
        - name: duration
          databrewType: Int32
          nativeConnectorType: integer
          pk: false
          nullable: true
        - name: end_location
          databrewType: String
          nativeConnectorType: text
          pk: false
          nullable: true
        - name: fare_amount
          databrewType: Float64
          nativeConnectorType: double precision
          pk: false
          nullable: true
        - name: passenger_id
          databrewType: Int32
          nativeConnectorType: integer
          pk: false
          nullable: true
        - name: payment_method
          databrewType: String
          nativeConnectorType: text
          pk: false
          nullable: true
        - name: start_location
          databrewType: String
          nativeConnectorType: text
          pk: false
          nullable: true
        - name: timestamp
          databrewType: String
          nativeConnectorType: text
          pk: false
          nullable: true
processors: []
sink:
  driver: nats
  config:
    url: localhost:4222
    subject: taxi_rides
    username: ""
    password: ""

Start the pipeline

If you have Blink installed locally, you can start the pipeline by running the following command:

blink start -c blink.yaml

You should see the following output:

2024/05/22 22:12:41 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=567000
2024/05/22 22:12:42 INFO [source]: PostgreSQL-CDC: Query snapshot:  table=public.taxi_rides columns="[\"_cq_sync_time\" \"distance_traveled\" \"driver_id\" \"duration\" \"end_location\" \"fare_amount\" \"log_id\" \"passenger_id\" \"payment_method\" \"start_location\" \"timestamp\"]" batch-size=13500 offset=580500

The logs above display the data that is being streamed from the snapshot of existing data in the Postgres table.

Your logs may be slightly different as you may have different data in your Postgres table.

Check the data in NATS

The last step we can do is to check the data in NATS. You can use the NATS CLI tool to check the data in the subject.

nats sub -s nats://127.0.0.1:4222 "taxi_rides"

If you did everything correctly, you should see the following logs:

22:17:42 Subscribing on taxi_rides

[#1] Received on "taxi_rides"
[{"_cq_sync_time":null,"distance_traveled":19.15,"driver_id":1,"duration":320,"end_location":"55 Schlimgen Road","fare_amount":309.56,"log_id":8540,"passenger_id":583,"payment_method":"cash","start_location":"87 Fisk Driv","timestamp":"08/18/2022"}]

In this article, we explored how to set up a streaming pipeline from PostgreSQL to NATS using DataBrew Cloud and the open-source tool Blink. We covered the initial setup of PostgreSQL for Change Data Capture (CDC), configuring NATS, and creating a streaming pipeline with DataBrew Cloud. Additionally, we demonstrated how to achieve the same result using Blink, a powerful CLI tool from DataBrew.

By leveraging these tools, you can build efficient and scalable data streaming solutions for various use cases such as event-driven architectures, real-time analytics, and seamless data integration with external systems. Streaming data from PostgreSQL using CDC ensures minimal load on your database and reliable data delivery, even if your consumer is temporarily offline.

If you found this guide helpful and are interested in supporting our work, please consider giving a star to our project on GitHub. Your support helps us continue to develop and improve these tools.

Give a star on GitHub! https://github.com/usedatabrew/blink

Logical Replication is easy with DataBrew

Vladyslav Len — Sun, 24 Sep 2023 20:59:38 +0000

In the ever-evolving landscape of data management, one of the most pressing challenges organizations face is ensuring the seamless and real-time replication of data across their systems. This critical process underpins a myriad of operations, from maintaining data consistency to enabling data analytics and business intelligence. However, selecting the optimal approach for data replication can be a formidable task, as it necessitates a careful evaluation of the available options. Navigating this intricate terrain demands a comprehensive understanding of the nuances and trade-offs associated with all methods, as organizations strive to make informed decisions to meet their specific data replication needs.

Logical Replication in PostgreSQL

Logical replication in PostgreSQL enables the selective replication of data changes at a logical level. Publishers define what data to replicate through publications, and subscribers receive these changes.

To create a publication, you specify the tables and types of changes to replicate. For instance:

CREATE PUBLICATION my_pub FOR TABLE my_table WITH (publish INSERT, publish UPDATE);

Subscribers express interest in specific publications:

CREATE SUBSCRIPTION my_sub
  CONNECTION 'dbname=remote_db host=remote_host user=replication_user password=secret'
  PUBLICATION my_pub;

PostgreSQL then streams changes (e.g., INSERT, UPDATE) from the publisher to subscribers, facilitating real-time data replication. Managing this process requires careful consideration of data integrity and schema changes.

Using these two commands from above could be enough to create fully-fledged data replication for your instance. I wish it was :)

As soon as you dive into this, you realize that you need way more things that PG can provide out of the box. Let’s walk through some of them:

Monitoring, because you want to know what is happening right now with your replication.
Alerting, because you want to react to incidents immediately as they occur.
Visibility. When having more than 2 databases you may have a lot of logical replication set up. It gets incredibly hard to keep an eye on them.

That’s why you need a tool to solve all of these and even more. That’s one of the reasons why we created DataBrew.

We wanted to give developers a way to deploy, observe, and control their data pipelines.

Replication with DataBrew

DataBrew is a cloud-based data platform that gives the ability to work with different data sources. Combine them, stream, and merge. Currently, provides the ability to adopt Change-Data-Capture for PostgreSQL and MySQL

Setting up replication for PostgreSQL (event MySQL) with DataBrew is easy.

First, you have to create an account https://databrew.tech and verify your email.

DataBrew provides a free tier for all new accounts. So you can experiment with data replication for free or even stay on a free tier as long as you want.

After the login, you will be able to create PostgreSQL and MySQL services.

They are compatible drivers. It means you can copy data from PostgreSQL tables into MySQL and vice versa

After you create your services (source and target databases), it’s time to create DataFlows.

In DataBrew’s dictionary, DataFlow — is a connection between two Services. Basically, it’s a data replication pipeline. It can have multiple states, like starting, creating, and stopping.

To create a new DataFlow, simply open the service page and press the “+” button to select the service and the direction of DataFlow.

After the DataFlow is created you will be able to start it to start data replication.

When running logical replication on DataBrew — we take care of all the things you need. It means that you don’t have to write code to maintain your replication anymore.

You can provide us with your WebHook URL and DataBrew will be sending updates to your system as soon as they happen. Like DataFlow failure, start or stop.

If you have to manage a lot of replications, means you have a lot of DataFlows — you can get real-time visualization of the state of your system.

There are a lot of features coming to DataBrew following months, make sure you create an account or follow us on social media to stay updated.

Soon we are going to release advanced data flow transformations, integration with blockchain data stream, and many more.

Thanks for reading the article! We hope we could have sparked the interest in your eyes to give DataBrew a shot.

Useful links

Website: https://databrew.tech
DataBrew Documentation — https://docs.databrew.tech/
Twitter: https://twitter.com/@usedatabrew
LinkedIn: https://www.linkedin.com/company/databrewinc/
Email: contact@databrew.tech

DataBrew - a new way of integrating CDC into your project

Vladyslav Len — Mon, 21 Aug 2023 14:44:29 +0000

Back in the time when I was working for one of the previous companies - I faced the need in setting up data replication. We were fast growing startup at that point and as most of the startups we made quite a few mistakes during the active growth phase. By having microservice architecture we didn't have enough time to architect them well. It led us to the situation when we had micro-service architecture which looked more like monolithical one. 95% of communications were done by direct HTTP calls. That was exactly that one thing that let us down.

During the peak load times we had a more internal call than external ones. I know what you think about - "They must be dumb", and I'd say not fully :)
Most features we produced in a short term, sacrificing the stability with high hopes to fix it later.

Most of the problems were caused by services that contained important data we had to rely on in other services. So on each client call - we had to make 2+ underlying calls to return this data. (Caching was not an option since data couldn't be old due to requirements)

These services shortly became SPOF (Single Point of Failure) and we have to do something. We came up with adopting CDC (a.k.a Data Replication), which we spen countless hours trying to implement it, finding proper services, toolings, etc. But in the end - it helped. We managed to build really great architecture that could stand during the peak hours with no problems.

Now, when the background is set - let's talk about the journey we had made to solve our problems with CDC.

I can say for sure - CDC is not a magic pill, it may not solve all your problems, but it may help you gain precious time to grow, keeping your customers engaged and raising more money to re-write your architecture down the road.

You see, problem here is that most of the Replication/ETL services are focused on a few things:

Database to warehouse to perform analytics
Database to Database full replication with no transformations

Especially, when you start googling about CDC implementation you will find a ton of links following to projects like Confluent.io and Debezium.
Don't get me wrong, these are the projects that push CDC industry forward, but they are extremely complex when you see them for the first time in your life. And you as a CTO/Tech lead of the Startup usually can't afford investing so much time into these things having no idea whether they will work out or not.

Meet DataBrew

DataBrew is a SaaS project that provides an easy way to integrate CDC(Change-Data-Capture) into your architecture. Basically by creating datamesh where you define the data your services expose and any service can consume that.

You can see DataBrew's service dashboard with data streams going to the serviceWe tried to gather all our knowledge we gained during the CDC experiments and maintenance and create a product that will help developers.

DataBrew was created with a few things in mind:

We want to give developers more time to work on business logic, write the code - not spend countless hours debugging Kafka.
We want to give developers the most important thing - representation of the actual data flows. So they can see all the flows of data coming to the service and vice versa.
We want to have strict data contracts. Even if you service has 45 tables, you still can define that it exports only 2 of them. To prevent people from blind creation of DataFlows without thinking about the system stability.
We want to make it robust. Adopting #CDC may seem a bit risky decision, but with proper alerting, monitoring - nothing to worry about.

Currently, we are running in closed-beta but, we are going to open DataBrew for public access this September.

Feel free to apply for early access - we will reach out to you as soon as possible to discuss all the details.

Please, keep in mind that during the closed-beta we only support PostgeSQL database

Thanks for reading the article! We hope we could have sparked the interest in your eyes to give DataBrew a shot.

Useful links

Project website: https://databrew.tech
DataBrew Documentation - https://docs.databrew.tech/
Twitter: https://twitter.com/@usedatabrew
Email: contact@usedatabrew.com

Rethink the way you share the data between micro-services with Change-Data-Capture

Vladyslav Len — Thu, 06 Jul 2023 11:51:17 +0000

Organizing and sharing your data across the micro-services, these are the questions every developer or architect starts asking himself at some point. This is exactly the question I was asking myself when I realized that something is wrong with the architecture I had built.

The problem we will be talking about is old and simple and it already has a few ways to solve it. My goal here, in this article, is to share my experience of dealing with it, and the ways to make your life easier :)
Now, we finished with the intro, let's deep dive into the problem.

When you have more than 1 service in your system you need to decide how will you be sharing the data across the system. The reason why you ask this question is that it's almost impossible to build the right micro-service architecture using manuals and common practices. It's not just impossible, I truly believe there is no such thing as correct micro-service architecture. It can be more or less shiny, but in general, at some point will have to break a few guidelines.
So, let's take a look at a few ways to share the data across the system

1. Direct calls to the micro-service

Probably the easiest way to get the data from different micro-service. Most of the developers chose this approach because it's fast, easy to understand and the implementation is simple. BUT, there is another side to this approach. Cascading failure. This is something I have been facing for a long time. And this is exactly the reason I started looking for another way to share my data and increase the overall availability of the system.

2. EvenSourcing architecture

Generally, I don't mind using event sourcing while building the micro-services. It's a great way to share the data since each micro-service can store only the data it needs, so there is no data duplication. But it requires developers to write more code to deal with async event handling. Basically, each time you create a new service - you need to write a code to integrate your service with your event bus. It also requires a bit more debugging in case something goes wrong because it's hard to determine where exactly the bug occurs.

3. Selective/Logical Data Replication

Selective or Logical Data replication is the approach when developers don't write a code to sync the data between the services. They simply continue working on the service querying the data from the database as would this data belong to the service. Consistency and data replication is guaranteed by the infrastructure. This Selective/Logical Replication is possible because of change-data-capture.

The idea of change-data-capture is simple.

You have a database that constantly does something, inserts the data, deletes the data, etc. And what we do is we "subscribe" to the logs of these changes. We read the stream of the events that are happening in our database. From there we can process it down the road, read the data inserted, transform it, etc.
There are plenty of databases that support change-data-capture integration. Databases like Postgres, MySQL can export their change log and you can use different tools to parse it and use the data from the log.

Packages and technologies like pglogrepl and Debezium can help you build your own change-data-capture framework/layer within the infrastructure.

At this point, you may have probably guessed how can we use that to implement a better way to share the data between micro-services. By using the replication we can implement the system when the services will be sharing the data, but will not be coupled and will not be impacting one other.

It is worth mentioning, that building the replication for your micro-services is a complex and time-consuming process. But it has its own benefits.

As an example: you don't have to write the code to get the data from another microservice like with EventSourcing which means that you will be able to create new services (therefore grow) faster. You also will be able to debug the inconsistency easier, and most important you will be able to recover the data after the outage by having initial sync that will populate all the services with the data from the source.

In DataBrew this is exactly the thing we are working on. We are aiming to get the developers a simple way to build the replication for their services. Without any need to build, or maintain this complex infrastructure. Please, visit our website and see how we can help you in tailoring your data replication.

DEV Community: Vladyslav Len

SQLNet is A social network that looks like Twitter, but you write SQL to do anything

Why build this?

The architecture of collective delusion

The sync engine: an undocumented pragma and a Discord help

Bidirectional sync: beautiful in theory, terrifying at scale

This will absolutely not scale, and I'm telling you now

The vandalism feature (yes, it's a feature now)

The terminal aesthetic

The me() function: three characters that make it personal

Local tables: your private chaos

What I learned building this

The stack (for those who care)

Who is this for?

Your Project Shouldn’t Break When You Switch Branches. So I Fixed It

The “Wait, PostgreSQL Can Do WHAT?” Moment

Building the Thing

The Gotcha That Almost Broke Me

Scope Creep (But Like, The Good Kind)

The Part Where I Get Sentimental

The Takeaway

Okay, it's time to change your cron job

Cron jobs are not evolving

The reality

Solution?

PostgreSQL to NATS Streaming

Introduction

Why would you do that?

What are the benefits?

Requirements

Postgres setup

NATS setup

Start with DataBrew Cloud

Add Postgres source

Add NATS sink

Creating the pipeline

Start with Open Source Blink

Download and install Blink

Create a new pipeline

Start the pipeline

Check the data in NATS

Logical Replication is easy with DataBrew

Logical Replication in PostgreSQL

Replication with DataBrew

Useful links

DataBrew - a new way of integrating CDC into your project

Meet DataBrew

Useful links

Rethink the way you share the data between micro-services with Change-Data-Capture

1. Direct calls to the micro-service

2. EvenSourcing architecture

3. Selective/Logical Data Replication

The `me()` function: three characters that make it personal