<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Burnside Project</title>
    <description>The latest articles on DEV Community by Burnside Project (@burnsideproject).</description>
    <link>https://dev.to/burnsideproject</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3852611%2F5e130438-8c20-42d5-a212-b8d97eee8ee2.png</url>
      <title>DEV Community: Burnside Project</title>
      <link>https://dev.to/burnsideproject</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/burnsideproject"/>
    <language>en</language>
    <item>
      <title>pg-stress — Stress Testing PostgreSQL with Claude-powered advisory</title>
      <dc:creator>Burnside Project</dc:creator>
      <pubDate>Thu, 02 Apr 2026 15:38:50 +0000</pubDate>
      <link>https://dev.to/burnsideproject/pg-stress-stress-testing-postgresql-with-claude-powered-advisory-278g</link>
      <guid>https://dev.to/burnsideproject/pg-stress-stress-testing-postgresql-with-claude-powered-advisory-278g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiek9qqbksvku8vkeiw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiek9qqbksvku8vkeiw9.png" alt=" " width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test it like It’s a Machine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I started building pg-collector (another project that uses heavy stress test using pg-test), I ran into a problem very quickly:&lt;/p&gt;

&lt;p&gt;I didn’t have a reliable way to break PostgreSQL on demand. I looked and there are lots of generating synthetic data but no comprehensive stress test tools.&lt;/p&gt;

&lt;p&gt;And that’s when the idea for pg-stress was born.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeecicttz2syawoelf5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeecicttz2syawoelf5v.png" alt=" " width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built from “eat your own dog food”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-stress didn’t start as a product idea.&lt;br&gt;
It started as a necessity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv88m2yt4nnfi76lh4zmr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv88m2yt4nnfi76lh4zmr.png" alt=" " width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While building pg-collector, I needed:&lt;br&gt;
    • real workload patterns&lt;br&gt;
    • repeatable failure scenarios&lt;br&gt;
    • controlled environments to observe behavior&lt;br&gt;
    • run these test in random pattern continously 7/24 for 2 weeks ! &lt;/p&gt;

&lt;p&gt;So I built a tool to stress PostgreSQL intentionally — not just benchmark it.&lt;/p&gt;

&lt;p&gt;That tool became pg-stress.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Think of it like automobile testing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a new car is built, it’s not just driven on smooth roads.&lt;/p&gt;

&lt;p&gt;It’s tested in:&lt;br&gt;
    • freezing cold&lt;br&gt;
    • extreme heat&lt;br&gt;
    • rough terrain&lt;br&gt;
    • high-speed stress conditions&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Before releasing a new query in the **WILD - inject that query in pg-stress. Stress test with 100s of connections, test the joints, insert 10M rows(kidding ! Not really .. I inserted 30M records …)Output the findings to Claude Chat box for advisory!&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Databases are no different.&lt;/p&gt;

&lt;p&gt;Production issues show up under:&lt;br&gt;
    • traffic spikes&lt;br&gt;
    • bad queries&lt;br&gt;
    • ORM inefficiencies&lt;br&gt;
    • long-running transactions&lt;br&gt;
    • unpredictable write patterns&lt;/p&gt;

&lt;p&gt;pg-stress is built to simulate exactly that.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What pg-stress actually tests&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is not just “run SELECT 1 in a loop.”&lt;/p&gt;

&lt;p&gt;We push real-world chaos into PostgreSQL:&lt;/p&gt;

&lt;h2&gt;
  
  
  ORM-like behavior
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• bursty inserts / updates / deletes
• inefficient query patterns
• transactional noise
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;h2&gt;
  
  
  Jitter + randomness
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• non-uniform traffic
• unpredictable workloads
• concurrency spikes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;h2&gt;
  
  
  Bloat &amp;amp; connection pressure
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• table/index bloat scenarios
• connection exhaustion
• lock contention
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;h2&gt;
  
  
  Query stress
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• slow queries under load
• joins at scale
• degradation over time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;From stress → context → intelligence&lt;/p&gt;

&lt;p&gt;Most stress tools stop here:&lt;/p&gt;

&lt;p&gt;“Here are your TPS and latency numbers.”&lt;/p&gt;

&lt;p&gt;That’s not enough.&lt;/p&gt;

&lt;p&gt;pg-stress produces structured output designed for context:&lt;br&gt;
    • query behavior&lt;br&gt;
    • system response&lt;br&gt;
    • degradation patterns&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built for AI-assisted diagnosis&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The output of pg-stress is optimized for modern workflows:&lt;/p&gt;

&lt;p&gt;Feed it directly into models like Claude&lt;/p&gt;

&lt;p&gt;And you can get:&lt;br&gt;
    • query tuning recommendations&lt;br&gt;
    • index suggestions&lt;br&gt;
    • capacity predictions&lt;br&gt;
    • failure explanations&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Part of a bigger system&lt;br&gt;
    • pg-stress → generates pressure&lt;br&gt;
    • pg-collector → observes + learns&lt;br&gt;
    • (future) → AI layer predicts and advises&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you can’t break your database in a controlled environment…&lt;/p&gt;

&lt;p&gt;It will break in production — on its own terms.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lnkd.in/gSwrvWyw" rel="noopener noreferrer"&gt;Repo:&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  DataEngineering
&lt;/h1&gt;

&lt;h1&gt;
  
  
  AIInfrastructure
&lt;/h1&gt;

&lt;h1&gt;
  
  
  MachineLearning
&lt;/h1&gt;

&lt;h1&gt;
  
  
  PostgreSQL
&lt;/h1&gt;

&lt;h1&gt;
  
  
  OpenSource
&lt;/h1&gt;

</description>
      <category>postgres</category>
    </item>
    <item>
      <title>AI-powered PostgreSQL observability</title>
      <dc:creator>Burnside Project</dc:creator>
      <pubDate>Tue, 31 Mar 2026 03:43:07 +0000</pubDate>
      <link>https://dev.to/burnsideproject/know-your-postgresql-health-before-it-breaks-ai-powered-postgresql-observability-36he</link>
      <guid>https://dev.to/burnsideproject/know-your-postgresql-health-before-it-breaks-ai-powered-postgresql-observability-36he</guid>
      <description>&lt;p&gt;pg-collector streams live PostgreSQL telemetry into a 7-dimension state machine that predicts failures, detects query regressions, and answers the 5 questions every DBA asks — automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ogclz92tkqkw5oef1nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ogclz92tkqkw5oef1nz.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is my database healthy?&lt;/strong&gt;&lt;br&gt;
Single-sentence verdict with confidence level, time-in-state, and 7-dimension breakdown. No interpretation needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxejpl8ghuigxg9k0sg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaxejpl8ghuigxg9k0sg.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;What changed?&lt;/strong&gt;&lt;br&gt;
Causal narrative linking state transitions to query regressions, workload shifts, and configuration changes with timestamps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What will break next?&lt;/strong&gt;&lt;br&gt;
Ranked risk register with 'days to breach' projections. Vacuum wraparound in 18 days. Connection exhaustion by April 12.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe956g6g8lbyeojrw57da.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe956g6g8lbyeojrw57da.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;What caused this spike?&lt;/strong&gt;&lt;br&gt;
Root cause attribution chains: query workload change -&amp;gt; cache eviction -&amp;gt; checkpoint storm -&amp;gt; lock cascade. Automatic cross-dimension correlation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is performance trending?&lt;/strong&gt;&lt;br&gt;
30-day health report with per-dimension trajectories, volatility metrics, week-over-week comparisons, and prediction accuracy tracking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://gcp.burnsideproject.ai" rel="noopener noreferrer"&gt;Get an Early Access&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/burnside-project/pg-collector" rel="noopener noreferrer"&gt;Git Repo&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>go</category>
      <category>gcp</category>
    </item>
    <item>
      <title>pg-warehouse - A local-first data warehouse at scale without over Engineering that mirrors PostgreSQL data - no pipelines needed!</title>
      <dc:creator>Burnside Project</dc:creator>
      <pubDate>Tue, 31 Mar 2026 03:29:14 +0000</pubDate>
      <link>https://dev.to/burnsideproject/pg-warehouse-a-local-first-data-warehouse-at-scale-without-over-engineering-that-mirrors-2i4c</link>
      <guid>https://dev.to/burnsideproject/pg-warehouse-a-local-first-data-warehouse-at-scale-without-over-engineering-that-mirrors-2i4c</guid>
      <description>&lt;p&gt;PostgreSQL → DuckDB → SQL Engine → Parquet →And beyond ....&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Local-First Analytics Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data teams often spend more time operating infrastructure than actually building features.&lt;/p&gt;

&lt;p&gt;To construct an AI feature pipeline, organizations frequently spin up heavy stacks consisting of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Large cloud VMs&lt;br&gt;
Distributed compute clusters&lt;br&gt;
Streaming infrastructure&lt;br&gt;
Data warehouses&lt;br&gt;
Orchestration systems&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These systems consume significant engineering effort and infrastructure cost before a single feature is produced.&lt;/p&gt;

&lt;p&gt;The irony is that the core goal of most pipelines is simple: transform operational data into features for analytics or machine learning.&lt;/p&gt;

&lt;p&gt;Yet the industry default architecture looks like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL → Kafka → Spark/Flink → Data Warehouse → Feature Store&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This architecture is powerful — but also massively over-engineered for many workloads.&lt;/p&gt;

&lt;p&gt;Most teams simply want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mirror production data&lt;/li&gt;
&lt;li&gt;Run SQL transformations&lt;/li&gt;
&lt;li&gt;Generate datasets&lt;/li&gt;
&lt;li&gt;Export them to analytics or AI pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But instead they end up managing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka clusters&lt;/li&gt;
&lt;li&gt;Spark jobs&lt;/li&gt;
&lt;li&gt;Cloud warehouses&lt;/li&gt;
&lt;li&gt;Orchestration systems&lt;/li&gt;
&lt;li&gt;Infrastructure costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many workloads, &lt;strong&gt;this complexity isn’t necessary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That observation led to the creation of pg-warehouse.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bclpqbu68elh4p52szd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6bclpqbu68elh4p52szd.png" alt=" " width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Idea Behind pg-warehouse&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is a local-first pipeline engine that mirrors PostgreSQL OLTP data into a local DuckDB warehouse (analytical database).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It captures both:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The initial snapshot of tables&lt;/li&gt;
&lt;li&gt;Incremental changes from PostgreSQL replication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then developers can run SQL pipelines on top of the mirrored data.&lt;/p&gt;

&lt;p&gt;The result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PostgreSQL (OLTP)
      ↓
PostgreSQL replication stream
      ↓
pg-warehouse sync engine
      ↓
DuckDB local warehouse
      ↓
SQL feature pipelines
      ↓
Parquet / CSV datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything runs locally.&lt;/p&gt;

&lt;p&gt;No Kafka.&lt;/p&gt;

&lt;p&gt;No Spark.&lt;/p&gt;

&lt;p&gt;No warehouse cluster.&lt;/p&gt;

&lt;p&gt;Just:&lt;/p&gt;

&lt;p&gt;PostgreSQL&lt;br&gt;
DuckDB&lt;br&gt;
SQL&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why PostgreSQL Replication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of polling tables or running ETL queries, pg-warehouse uses PostgreSQL’s native replication capabilities.&lt;/p&gt;

&lt;p&gt;PostgreSQL exposes replication metadata through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write-Ahead Log (WAL)&lt;/li&gt;
&lt;li&gt;Logical replication slots&lt;/li&gt;
&lt;li&gt;LSN offsets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows pg-warehouse to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capture a consistent snapshot of selected tables&lt;/li&gt;
&lt;li&gt;Track WAL changes after the snapshot&lt;/li&gt;
&lt;li&gt;Apply incremental updates to DuckDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach has several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;minimal load on the OLTP database&lt;/li&gt;
&lt;li&gt;exactly-once incremental progress&lt;/li&gt;
&lt;li&gt;restart-safe replication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sync engine tracks replication progress in a local state database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why DuckDB?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DuckDB is an ideal engine for local analytics workloads.&lt;/p&gt;

&lt;p&gt;Key properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;columnar storage&lt;/li&gt;
&lt;li&gt;vectorized execution&lt;/li&gt;
&lt;li&gt;high-performance SQL&lt;/li&gt;
&lt;li&gt;embedded runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows pg-warehouse to transform row-oriented PostgreSQL data into *&lt;em&gt;columnar analytics tables.&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Example workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;plays&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_activity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These pipelines generate derived feature tables that can be exported to Parquet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does its Scale?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What 90% of AI Data Pipelines Actually Do ..?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When people talk about PostgreSQL AI data pipelines, the architecture often looks intimidating:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL → SOME HEAVY DISTRIBUTED SYSTEMS → SOME EXPENSE CLOUD DATA WAREHOUSE → Feature Store → ML Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But if you examine the actual work performed inside most pipelines, the reality is much simpler.&lt;/p&gt;

&lt;p&gt;Most pipelines are just SQL transformations. In practice, 90% of pipelines reduce to a few simple steps:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a9pyu954y9mrcxy365h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a9pyu954y9mrcxy365h.png" alt=" " width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical Pipeline Operations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;filter rows&lt;/li&gt;
&lt;li&gt;aggregate events&lt;/li&gt;
&lt;li&gt;join metadata&lt;/li&gt;
&lt;li&gt;compute features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are operations that columnar engines have optimized for decades. You don’t need a distributed compute cluster to do them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Feature Tables Are Much Smaller Than Raw Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another reason pipelines are frequently over-engineered is a misunderstanding of how data volumes evolve across pipeline layers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Raw event streams can be large, but feature tables are dramatically smaller.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o7pqhi7n11rvlqhxadk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o7pqhi7n11rvlqhxadk.png" alt=" " width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you ingest 200 GB of raw events per day, your final feature tables might only be 2–10 GB. That is well within the capabilities of a single-node columnar engine like DuckDB.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The 90% Pipeline Design Target&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you optimize a system for the following workload profile:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2xk93pa0bx8kvq5s5uh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2xk93pa0bx8kvq5s5uh.png" alt=" " width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can cover roughly 90% of real-world AI feature pipelines.&lt;br&gt;
Everything beyond that tends to fall into hyperscale edge cases, such as:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;trillion-event streaming systems&lt;/li&gt;
&lt;li&gt;global ad-tech platforms&lt;/li&gt;
&lt;li&gt;massive graph processing&lt;/li&gt;
&lt;li&gt;real-time distributed ML training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are important problems — but they are not the common case.&lt;/p&gt;

&lt;p&gt;The Over-Engineering Problem&lt;/p&gt;

&lt;p&gt;Modern data infrastructure often assumes that every pipeline must be built with distributed systems.&lt;/p&gt;

&lt;p&gt;This leads to stacks like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Kafka&lt;br&gt;
Spark&lt;br&gt;
Flink&lt;br&gt;
Airflow&lt;br&gt;
Data Warehouse&lt;br&gt;
Feature Store&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While powerful, these systems introduce significant overhead:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;infrastructure complexity&lt;br&gt;
operational cost&lt;br&gt;
latency&lt;br&gt;
specialized expertise&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For many teams, this complexity is unnecessary.&lt;/p&gt;

&lt;p&gt;A simpler architecture is often sufficient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PostgreSQL
   ↓
DuckDB
   ↓
SQL feature pipelines
   ↓
Parquet datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Where pg-warehouse fits ?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is designed specifically for this 90% case.&lt;/p&gt;

&lt;p&gt;It mirrors PostgreSQL OLTP data into a local DuckDB warehouse, where developers can run SQL feature pipelines and export datasets to Parquet.&lt;/p&gt;

&lt;p&gt;Instead of building complex distributed pipelines, developers can focus on the transformations that actually matter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PostgreSQL
   ↓
Replication stream
   ↓
pg-warehouse
   ↓
DuckDB
   ↓
SQL feature pipelines
   ↓
Parquet datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a local-first analytics stack that removes unnecessary infrastructure while still supporting the core transformations used in most AI pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is designed around a clean separation of concerns.&lt;/p&gt;

&lt;p&gt;Open Core components include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sync Engine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL snapshot&lt;/li&gt;
&lt;li&gt;replication stream processing&lt;/li&gt;
&lt;li&gt;applying changes to DuckDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warehouse Layer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;DuckDB acts as the columnar analytics engine.&lt;/p&gt;

&lt;p&gt;Tables are stored under the raw schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipeline Engine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Developers write SQL files describing feature pipelines.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pipelines/user_features.sql&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;These pipelines read from raw tables and produce derived datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Export Engine&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Exports datasets to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parquet&lt;/li&gt;
&lt;li&gt;CSV&lt;/li&gt;
&lt;li&gt;external storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Design Principles&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse follows several design constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local-First&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The entire pipeline runs locally.&lt;/p&gt;

&lt;p&gt;No cloud infrastructure required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single Binary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is implemented in GoLang and compiled as a single executable.&lt;/p&gt;

&lt;p&gt;This simplifies deployment dramatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config-Driven&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All workflows are configured using a single YAML file.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my_warehouse&lt;/span&gt;

&lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres://warehouse:password@pg-host:5432/mydb&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public&lt;/span&gt;

&lt;span class="na"&gt;duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;raw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./raw.duckdb&lt;/span&gt;
  &lt;span class="na"&gt;silver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./silver.duckdb&lt;/span&gt;
  &lt;span class="na"&gt;feature&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./feature.duckdb&lt;/span&gt;

&lt;span class="na"&gt;cdc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;publication_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pgwh_pub&lt;/span&gt;
  &lt;span class="na"&gt;slot_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pgwh_slot&lt;/span&gt;
  &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;public.orders&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;public.customers&lt;/span&gt;

&lt;span class="na"&gt;sync&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public.orders&lt;/span&gt;
      &lt;span class="na"&gt;target_schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;raw&lt;/span&gt;
      &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;watermark_column&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public.customers&lt;/span&gt;
      &lt;span class="na"&gt;target_schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;raw&lt;/span&gt;
      &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;watermark_column&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;State Durability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Replication progress is stored in a SQLite state database.&lt;/p&gt;

&lt;p&gt;This allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;warehouse rebuilds&lt;/li&gt;
&lt;li&gt;crash recovery&lt;/li&gt;
&lt;li&gt;deterministic incremental sync&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can delete and rebuild your DuckDB warehouse without losing sync state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hexagonal Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse follows a ports and adapters architecture.&lt;/p&gt;

&lt;p&gt;Core logic is isolated from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL adapter&lt;/li&gt;
&lt;li&gt;DuckDB adapter&lt;/li&gt;
&lt;li&gt;export adapters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows enterprise extensions without modifying core logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open Core Model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is an open core project.&lt;/p&gt;

&lt;p&gt;Open Source includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL replication-based data sync&lt;/li&gt;
&lt;li&gt;DuckDB local analytics warehouse&lt;/li&gt;
&lt;li&gt;SQL feature pipelines&lt;/li&gt;
&lt;li&gt;Parquet / CSV dataset export&lt;/li&gt;
&lt;li&gt;Local-first pipeline execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enterprise features will include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud storage pipelines (S3 / GCS / Azure)&lt;/li&gt;
&lt;li&gt;Version-controlled SQL pipelines&lt;/li&gt;
&lt;li&gt;Distributed synchronization across nodes&lt;/li&gt;
&lt;li&gt;Production observability and monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why This Matters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data engineering is slowly shifting toward simpler architectures.&lt;/p&gt;

&lt;p&gt;Modern embedded engines like DuckDB make it possible to run serious analytics workloads without clusters.&lt;/p&gt;

&lt;p&gt;pg-warehouse aims to cover the 80% case for building datasets and AI features from PostgreSQL.&lt;/p&gt;

&lt;p&gt;For many teams:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL → DuckDB → Parquet&lt;br&gt;
Is enough.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Project&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pg-warehouse is open core and available on GitHub.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for contributors!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/burnside-project/pg-warehouse" rel="noopener noreferrer"&gt;Repository&lt;br&gt;
&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/burnside-project/pg-warehouse/blob/main/docs/02-quickstart.md" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/burnside-project/pg-warehouse/blob/main/docs/08-development-workflow.md" rel="noopener noreferrer"&gt;Development Workflow&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>data</category>
      <category>warehouse</category>
      <category>duckdb</category>
    </item>
  </channel>
</rss>
