<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fritz Larco</title>
    <description>The latest articles on DEV Community by Fritz Larco (@flarco).</description>
    <link>https://dev.to/flarco</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3283560%2F494dc7b5-96e2-48f0-958f-a9ebf6523eef.jpeg</url>
      <title>DEV Community: Fritz Larco</title>
      <link>https://dev.to/flarco</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flarco"/>
    <language>en</language>
    <item>
      <title>Sync PostgreSQL to MotherDuck with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 01 Jun 2026 17:02:53 +0000</pubDate>
      <link>https://dev.to/flarco/sync-postgresql-to-motherduck-with-sling-11oo</link>
      <guid>https://dev.to/flarco/sync-postgresql-to-motherduck-with-sling-11oo</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://motherduck.com/" rel="noopener noreferrer"&gt;MotherDuck&lt;/a&gt; is a serverless analytics service built on DuckDB. It hosts DuckDB databases in the cloud and keeps the same SQL surface you'd use locally. PostgreSQL is what most apps run on for transactional data.&lt;/p&gt;

&lt;p&gt;So you usually want both: Postgres for the app, MotherDuck for analytics. The part in the middle that copies tables across is what &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;This guide replicates a PostgreSQL schema into MotherDuck with Sling, in both full-refresh and incremental modes. The CLI output and row counts below come from an actual run, not a fabricated one.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install method fits your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full installation notes are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the PostgreSQL Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. For PostgreSQL you'll need host, port, database, user, and password.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;sling conns set&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;PG_SOURCE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myuser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;PG_SOURCE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myuser&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
    &lt;span class="na"&gt;sslmode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;PG_SOURCE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;PostgreSQL connection docs&lt;/a&gt; cover SSL, IAM auth, and other options.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the MotherDuck Target
&lt;/h1&gt;

&lt;p&gt;A MotherDuck connection needs the database name and a service token. You can generate a token from the &lt;a href="https://motherduck.com/docs/authenticating-to-motherduck#authentication-using-a-service-token" rel="noopener noreferrer"&gt;MotherDuck UI&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MOTHERDUCK &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;motherduck &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my_db &lt;span class="nv"&gt;motherduck_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eyJhbGciOi...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or the URL form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MOTHERDUCK &lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"motherduck://my_db?motherduck_token=eyJhbGciOi..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;MOTHERDUCK&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;motherduck&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my_db&lt;/span&gt;
    &lt;span class="na"&gt;motherduck_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eyJhbGciOi...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;MOTHERDUCK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full options (attach modes, copy method, DuckDB version pinning) are in the &lt;a href="https://docs.slingdata.io/connections/database-connections/motherduck" rel="noopener noreferrer"&gt;MotherDuck connection docs&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the PostgreSQL source has three tables in a &lt;code&gt;demo_pg_motherduck&lt;/code&gt; schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; — 5,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 30,000 rows, with an &lt;code&gt;updated_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file lives next to wherever you want to run Sling from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PG_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MOTHERDUCK&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_pg_motherduck.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_pg_motherduck.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;customer_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to point out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object: demo_pg_motherduck.{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name into the target object, so you don't repeat yourself per stream.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;primary_key&lt;/code&gt; and &lt;code&gt;update_key&lt;/code&gt; are set even though the mode here is &lt;code&gt;full-refresh&lt;/code&gt;. The next section flips to incremental without touching those declarations; only the mode changes.&lt;/li&gt;
&lt;li&gt;The target schema gets created automatically by Sling on the first run. No manual &lt;code&gt;CREATE SCHEMA&lt;/code&gt; needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed for readability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [3 streams] | PG_SOURCE -&amp;gt; MOTHERDUCK

INF [1 / 3] running stream demo_pg_motherduck.customers
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table "demo_pg_motherduck"."customers"
INF inserted 5000 rows into "demo_pg_motherduck"."customers" in 11 secs [425 r/s] [390 kB]
INF execution succeeded

INF [2 / 3] running stream demo_pg_motherduck.orders
INF created table "demo_pg_motherduck"."orders"
INF inserted 30000 rows into "demo_pg_motherduck"."orders" in 14 secs [2,131 r/s] [2.6 MB]
INF execution succeeded

INF [3 / 3] running stream demo_pg_motherduck.events
INF created table "demo_pg_motherduck"."events"
INF inserted 60000 rows into "demo_pg_motherduck"."events" in 9 secs [6,036 r/s] [3.3 MB]
INF execution succeeded

INF Sling Replication Completed in 40s | PG_SOURCE -&amp;gt; MOTHERDUCK | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,000 rows across three tables, end to end, in 40 seconds. The &lt;code&gt;_tmp&lt;/code&gt; tables that show up in the full log are Sling's staging step before it swaps the data into the final target. They get cleaned up automatically.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;A &lt;code&gt;count(*)&lt;/code&gt; from MotherDuck right after the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'customers'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------+-------+
| T         |     C |
+-----------+-------+
| customers |  5000 |
| orders    | 30000 |
| events    | 60000 |
+-----------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small sample to confirm the data made the trip with types intact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+-------------+------------+--------+-------------------------------+
| EVENT_ID | CUSTOMER_ID | EVENT_TYPE | REGION | OCCURRED_AT                   |
+----------+-------------+------------+--------+-------------------------------+
|        1 |           2 | click      | us-2   | 2025-01-01 00:00:01 +0000 UTC |
|        2 |           3 | signup     | us-3   | 2025-01-01 00:00:02 +0000 UTC |
|        3 |           4 | purchase   | us-4   | 2025-01-01 00:00:03 +0000 UTC |
|        4 |           5 | page_view  | us-5   | 2025-01-01 00:00:04 +0000 UTC |
|        5 |           6 | click      | us-6   | 2025-01-01 00:00:05 +0000 UTC |
+----------+-------------+------------+--------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Numeric, varchar, and timestamp columns round-tripped cleanly. Nullable columns (&lt;code&gt;region&lt;/code&gt; is null on every seventh row in the source) are preserved as nulls, not as the string &lt;code&gt;"NULL"&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Switching to Incremental
&lt;/h1&gt;

&lt;p&gt;Full-refreshing a 60,000-row table every day is fine. Full-refreshing a 600-million-row event table every day is not. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; reads only the rows newer than the highest &lt;code&gt;update_key&lt;/code&gt; already in the target.&lt;/p&gt;

&lt;p&gt;Drop &lt;code&gt;customers&lt;/code&gt; from the streams (it changes slowly enough to keep on full-refresh in a separate run, or rebuild weekly) and switch the mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PG_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MOTHERDUCK&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_pg_motherduck.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_pg_motherduck.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert 1,000 new orders and 2,500 new events on the source (this simulates a day's worth of data flowing in), then run again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [2 streams] | PG_SOURCE -&amp;gt; MOTHERDUCK

INF [1 / 2] running stream demo_pg_motherduck.orders
INF getting checkpoint value (updated_at)
INF writing to target database [mode: incremental]
INF inserted 1000 rows into "demo_pg_motherduck"."orders" in 9 secs [104 r/s] [86 kB]
INF execution succeeded

INF [2 / 2] running stream demo_pg_motherduck.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF inserted 2500 rows into "demo_pg_motherduck"."events" in 6 secs [358 r/s] [137 kB]
INF execution succeeded

INF Sling Replication Completed in 20s | PG_SOURCE -&amp;gt; MOTHERDUCK | 2 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;getting checkpoint value&lt;/code&gt; line is where Sling looks at the target, finds the largest &lt;code&gt;updated_at&lt;/code&gt; already present, and uses that as the lower bound on the source query. Only the new rows come across:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      |     C |
+--------+-------+
| orders | 31000 |
| events | 62500 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Orders went from 30,000 to 31,000. Events went from 60,000 to 62,500. Matches what was inserted on the source.&lt;/p&gt;

&lt;p&gt;If you need updates as well as inserts (a row's &lt;code&gt;updated_at&lt;/code&gt; changes and the existing row should be replaced rather than duplicated), keep &lt;code&gt;mode: incremental&lt;/code&gt; and make sure &lt;code&gt;primary_key&lt;/code&gt; is set. Sling will upsert against the primary key instead of appending. The &lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;replication modes docs&lt;/a&gt; cover the trade-offs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common Tweaks
&lt;/h1&gt;

&lt;p&gt;A few options you'll reach for once the basics are in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema and column casing.&lt;/strong&gt; MotherDuck (DuckDB) is case-sensitive, and Sling defaults to keeping the source casing. Add &lt;code&gt;target_options: { column_casing: snake }&lt;/code&gt; under &lt;code&gt;defaults&lt;/code&gt; if your Postgres source has mixed-case identifiers and you want a clean snake_case target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add new columns automatically.&lt;/strong&gt; When the source schema changes, set &lt;code&gt;target_options: { add_new_columns: true }&lt;/code&gt; so Sling alters the MotherDuck table on the next run. Without it, new source columns get dropped at the boundary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick a copy method.&lt;/strong&gt; The default for MotherDuck is &lt;code&gt;csv_http&lt;/code&gt;. For very wide rows or large text values, switch to &lt;code&gt;arrow_http&lt;/code&gt; via &lt;code&gt;copy_method: arrow_http&lt;/code&gt; in the connection config. It's usually faster and avoids CSV escaping edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a custom &lt;code&gt;sql:&lt;/code&gt; block in a stream to project columns or filter rows before they leave Postgres. Cheaper than dragging unused columns to MotherDuck.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to Go Next
&lt;/h1&gt;

&lt;p&gt;The same replication pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into MotherDuck: MySQL, SQL Server, Snowflake, BigQuery, and the rest. Swap the source connection and leave the target alone.&lt;/p&gt;

&lt;p&gt;If you'd rather store flat files than warehouse tables, see &lt;a href="https://slingdata.io/articles/postgres-to-s3-parquet-with-sling/" rel="noopener noreferrer"&gt;PostgreSQL to S3 as Parquet&lt;/a&gt;, which uses the same replication file shape with a file-system target. For a local DuckDB setup instead of a managed MotherDuck one, see &lt;a href="https://slingdata.io/articles/postgres-to-duckdb-with-sling/" rel="noopener noreferrer"&gt;PostgreSQL to DuckDB&lt;/a&gt;. For team workflows with scheduling and alerting on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>duckdb</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Replicate MySQL to ClickHouse with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Tue, 26 May 2026 13:29:56 +0000</pubDate>
      <link>https://dev.to/flarco/replicate-mysql-to-clickhouse-with-sling-3ghj</link>
      <guid>https://dev.to/flarco/replicate-mysql-to-clickhouse-with-sling-3ghj</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://clickhouse.com/" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; is a columnar OLAP database. It runs aggregate queries across billions of rows in seconds. MySQL is what most apps run on for transactional reads and writes. Different jobs, different storage shapes, which is why people end up running them side by side: MySQL for the app, ClickHouse for analytics on top of the app's data.&lt;/p&gt;

&lt;p&gt;The piece in the middle, the bit that copies tables from MySQL into ClickHouse and keeps them current, is what &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;This guide replicates a MySQL schema into ClickHouse with Sling, in both full-refresh and incremental modes. The CLI output, row counts, and timings below all come from an actual run against a Docker MySQL on the source side and a self-hosted ClickHouse 25.4 on the target side. The same configuration works against &lt;a href="https://clickhouse.com/cloud" rel="noopener noreferrer"&gt;ClickHouse Cloud&lt;/a&gt;; only the connection URL changes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install method fits your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installation notes for every platform are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the MySQL Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. For MySQL you need host, port, database, user, and password.&lt;/p&gt;

&lt;p&gt;A read-only Sling user is the right shape for replication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="s1"&gt;'sling'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'%'&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;source_schema&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="s1"&gt;'sling'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;sling conns set&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MYSQL_SOURCE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mysql &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sling &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3306
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;MYSQL_SOURCE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sling&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your MySQL requires TLS, append &lt;code&gt;?tls=skip-verify&lt;/code&gt; to the URL form, or set &lt;code&gt;tls: skip-verify&lt;/code&gt; in the YAML. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;MYSQL_SOURCE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/mysql" rel="noopener noreferrer"&gt;MySQL connection docs&lt;/a&gt; cover SSL, IAM auth, and the rest of the options.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the ClickHouse Target
&lt;/h1&gt;

&lt;p&gt;ClickHouse speaks two protocols: native (port 9000) and HTTP (port 8123 / 8443 with TLS). Sling supports both. For self-hosted clusters the native protocol is usually the fastest path; for ClickHouse Cloud, the HTTPS endpoint is the supported one.&lt;/p&gt;

&lt;p&gt;Self-hosted, native protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;CLICKHOUSE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;clickhouse &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9000 &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse Cloud over HTTPS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;CLICKHOUSE &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://default:mypass@xxxxxx.us-east-1.aws.clickhouse.cloud:8443/default"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CLICKHOUSE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;CLICKHOUSE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/clickhouse" rel="noopener noreferrer"&gt;ClickHouse connection docs&lt;/a&gt; list every option, including the HTTP URL form and the &lt;code&gt;export_stream_format&lt;/code&gt; setting for tuning the staging file format.&lt;/p&gt;

&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the MySQL source has three tables in a &lt;code&gt;demo_mysql_clickhouse&lt;/code&gt; database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; — 5,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 30,000 rows, with an &lt;code&gt;updated_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file lives next to wherever you run Sling from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MYSQL_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLICKHOUSE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_mysql_clickhouse.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_mysql_clickhouse.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;customer_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth pointing out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object: demo_mysql_clickhouse.{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name into the target object, so you don't repeat yourself per stream.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;primary_key&lt;/code&gt; and &lt;code&gt;update_key&lt;/code&gt; are set even though the mode is &lt;code&gt;full-refresh&lt;/code&gt;. The next section flips to incremental without touching those declarations; only the mode changes.&lt;/li&gt;
&lt;li&gt;The target database (&lt;code&gt;demo_mysql_clickhouse&lt;/code&gt; on ClickHouse) gets created automatically by Sling on the first run. No manual &lt;code&gt;CREATE DATABASE&lt;/code&gt; needed on the target side.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed for readability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [3 streams] | MYSQL_SOURCE -&amp;gt; CLICKHOUSE

INF [1 / 3] running stream demo_mysql_clickhouse.customers
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table `demo_mysql_clickhouse`.`customers`
INF inserted 5000 rows into `demo_mysql_clickhouse`.`customers` in 0 secs [8,853 r/s] [396 kB]
INF execution succeeded

INF [2 / 3] running stream demo_mysql_clickhouse.orders
INF created table `demo_mysql_clickhouse`.`orders`
INF inserted 30000 rows into `demo_mysql_clickhouse`.`orders` in 1 secs [29,381 r/s] [2.8 MB]
INF execution succeeded

INF [3 / 3] running stream demo_mysql_clickhouse.events
INF created table `demo_mysql_clickhouse`.`events`
INF inserted 60000 rows into `demo_mysql_clickhouse`.`events` in 0 secs [81,559 r/s] [3.2 MB]
INF execution succeeded

INF Sling Replication Completed in 4s | MYSQL_SOURCE -&amp;gt; CLICKHOUSE | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,000 rows across three tables, end to end, in 4 seconds. The &lt;code&gt;_tmp&lt;/code&gt; tables that show up in the full log are Sling's staging step before it swaps the data into the final target. They get cleaned up automatically.&lt;/p&gt;

&lt;p&gt;When Sling creates the table, it asks for &lt;code&gt;MergeTree&lt;/code&gt; with the primary key columns as the sorting key. That's a fine baseline for analytical queries. The "Common Tweaks" section below covers how to override it when you need partitioning, replication, or a different engine.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;A &lt;code&gt;count()&lt;/code&gt; from ClickHouse right after the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'customers'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------+-------+
| T         | C     |
+-----------+-------+
| customers |  5000 |
| orders    | 30000 |
| events    | 60000 |
+-----------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small sample to confirm the data made the trip with types intact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+-------------+------------+--------+-------------------------------+
| EVENT_ID | CUSTOMER_ID | EVENT_TYPE | REGION | OCCURRED_AT                   |
+----------+-------------+------------+--------+-------------------------------+
|        1 |           2 | signup     | us-2   | 2025-01-01 00:00:01 +0000 UTC |
|        2 |           3 | purchase   | us-3   | 2025-01-01 00:00:02 +0000 UTC |
|        3 |           4 | logout     | us-4   | 2025-01-01 00:00:03 +0000 UTC |
|        4 |           5 | page_view  | us-1   | 2025-01-01 00:00:04 +0000 UTC |
|        5 |           6 | click      | us-2   | 2025-01-01 00:00:05 +0000 UTC |
+----------+-------------+------------+--------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Numeric, varchar, and timestamp columns round-tripped cleanly. The nullable &lt;code&gt;region&lt;/code&gt; column (every seventh row in the source is null) lands as ClickHouse &lt;code&gt;Nullable(String)&lt;/code&gt; and preserves nulls as nulls, not as the literal string &lt;code&gt;"NULL"&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Switching to Incremental
&lt;/h1&gt;

&lt;p&gt;Full-refreshing a 60,000-row event table every day is fine. Full-refreshing a 600-million-row event table every day is not. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; reads only the rows newer than the highest &lt;code&gt;update_key&lt;/code&gt; already in the target.&lt;/p&gt;

&lt;p&gt;Drop &lt;code&gt;customers&lt;/code&gt; from the streams (it changes slowly enough to keep on full-refresh in a separate run, or rebuild weekly) and switch the mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MYSQL_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLICKHOUSE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_mysql_clickhouse.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_mysql_clickhouse.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert 1,000 new orders and 2,500 new events on the source (a stand-in for a day of fresh data), then run again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [2 streams] | MYSQL_SOURCE -&amp;gt; CLICKHOUSE

INF [1 / 2] running stream demo_mysql_clickhouse.orders
INF getting checkpoint value (updated_at)
INF writing to target database [mode: incremental]
INF inserted 1000 rows into `demo_mysql_clickhouse`.`orders` in 0 secs [1,926 r/s] [93 kB]
INF execution succeeded

INF [2 / 2] running stream demo_mysql_clickhouse.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF inserted 2500 rows into `demo_mysql_clickhouse`.`events` in 0 secs [4,040 r/s] [134 kB]
INF execution succeeded

INF Sling Replication Completed in 2s | MYSQL_SOURCE -&amp;gt; CLICKHOUSE | 2 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;getting checkpoint value&lt;/code&gt; line is where Sling looks at the target, finds the largest &lt;code&gt;updated_at&lt;/code&gt; already present, and uses that as the lower bound on the source query. Only the new rows come across:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      | C     |
+--------+-------+
| orders | 31000 |
| events | 62500 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Orders went from 30,000 to 31,000. Events went from 60,000 to 62,500. Matches what was inserted on the source.&lt;/p&gt;

&lt;p&gt;ClickHouse's &lt;code&gt;MergeTree&lt;/code&gt; family is append-friendly. In incremental mode Sling inserts the new rows directly into the main table without rewriting partitions. If you also need updates (a row's &lt;code&gt;updated_at&lt;/code&gt; changes and you want the existing target row replaced rather than duplicated), keep &lt;code&gt;mode: incremental&lt;/code&gt; and make sure &lt;code&gt;primary_key&lt;/code&gt; is set. Sling will use a &lt;code&gt;ReplacingMergeTree&lt;/code&gt;-style upsert path against that key. The &lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;replication modes docs&lt;/a&gt; cover the trade-offs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common Tweaks
&lt;/h1&gt;

&lt;p&gt;A few options worth reaching for once the basics are in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pick a table engine.&lt;/strong&gt; ClickHouse's default &lt;code&gt;MergeTree&lt;/code&gt; is a fine baseline, but for high-write or replicated clusters you'll want &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;, partitioning by month, and a TTL. Set &lt;code&gt;target_options.table_ddl&lt;/code&gt; per stream with the full &lt;code&gt;CREATE TABLE&lt;/code&gt; you want; Sling will use it instead of generating its own. Example: &lt;code&gt;engine = MergeTree() ORDER BY (customer_id, occurred_at) PARTITION BY toYYYYMM(occurred_at)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add new columns automatically.&lt;/strong&gt; When the source schema changes, set &lt;code&gt;target_options: { add_new_columns: true }&lt;/code&gt; so Sling alters the ClickHouse table on the next run. Without it, new source columns get dropped at the boundary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune the staging format.&lt;/strong&gt; Sling stages data as a file before bulk-loading into ClickHouse. The default is &lt;code&gt;CSVWithNames&lt;/code&gt;, which is robust but verbose. For wide rows or large text values, set &lt;code&gt;export_stream_format: Parquet&lt;/code&gt; on the ClickHouse connection. Usually faster and more compact on the wire.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a custom &lt;code&gt;sql:&lt;/code&gt; block in a stream to project columns or filter rows before they leave MySQL. Cheaper than dragging unused columns to ClickHouse, and it keeps row payloads small for the network hop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to Go Next
&lt;/h1&gt;

&lt;p&gt;The same replication pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into ClickHouse: PostgreSQL, SQL Server, Snowflake, BigQuery, and the rest. Swap the source connection and leave the target alone. For the equivalent flow from a Postgres source, see &lt;a href="https://slingdata.io/articles/postgres-to-clickhouse-sling/" rel="noopener noreferrer"&gt;PostgreSQL to ClickHouse&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If your downstream is more cloud-warehouse than columnar engine, &lt;a href="https://slingdata.io/articles/motherduck-from-mysql-sling/" rel="noopener noreferrer"&gt;MySQL to MotherDuck&lt;/a&gt; covers the same setup with DuckDB-on-the-cloud as the target. For team workflows with scheduling and alerting on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>clickhouse</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Load PostgreSQL into Apache Iceberg with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 18 May 2026 13:31:11 +0000</pubDate>
      <link>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</link>
      <guid>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; is the table format that turns a pile of Parquet files in object storage into something that behaves like a warehouse table. You get schema evolution, hidden partitioning, time travel, and consistent reads from whichever engine you point at the table. PostgreSQL is where most operational data starts. Moving it into Iceberg gives you an analytics copy that DuckDB, Spark, Trino, Snowflake, and Athena can all read without anyone needing to agree on a single warehouse vendor first.&lt;/p&gt;

&lt;p&gt;Sling speaks the Iceberg &lt;a href="https://iceberg.apache.org/spec/#rest-catalog-spec" rel="noopener noreferrer"&gt;REST catalog&lt;/a&gt; directly. From the configuration side an Iceberg target is just another database connection: point Sling at the catalog URL and the underlying object store, then declare your streams. No JVM, no Spark, no manual manifest writing.&lt;/p&gt;

&lt;p&gt;This guide replicates a Postgres schema into Iceberg using &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt;. The catalog is &lt;a href="https://developers.cloudflare.com/r2/data-catalog/" rel="noopener noreferrer"&gt;Cloudflare R2's managed Iceberg REST catalog&lt;/a&gt; and the storage layer underneath is R2. Every CLI line, row count, and timing below comes from an actual run against those endpoints.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install fits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full install notes are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Postgres Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. A read-only user is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;mydb&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then register the connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;POSTGRES &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sling &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;POSTGRES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sling&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Postgres requires SSL, append &lt;code&gt;sslmode: require&lt;/code&gt;. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;POSTGRES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;Postgres connection docs&lt;/a&gt; cover SSL, IAM, and the rest.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Iceberg Target
&lt;/h1&gt;

&lt;p&gt;Sling treats Iceberg as a database-class target. The connection captures two things: the catalog, which stores table metadata, and the warehouse, which stores the actual Parquet data files. Sling supports REST, AWS Glue, and SQL catalogs. This guide uses REST.&lt;/p&gt;

&lt;p&gt;For Cloudflare R2's Iceberg catalog you need the catalog URL, an API token, the warehouse identifier (account-id + bucket name), and S3-compatible credentials for the R2 bucket underneath. All four come from the R2 dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ICEBERG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iceberg&lt;/span&gt;
    &lt;span class="na"&gt;catalog_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rest&lt;/span&gt;
    &lt;span class="na"&gt;rest_uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://catalog.cloudflarestorage.com/&amp;lt;accountid&amp;gt;/&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_catalog_api_token&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_warehouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;accountid&amp;gt;_&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_access_key_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_access_key_id&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_secret_access_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_secret_access_key&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a self-hosted &lt;a href="https://github.com/lakekeeper/lakekeeper" rel="noopener noreferrer"&gt;Lakekeeper&lt;/a&gt; or &lt;a href="https://projectnessie.org/" rel="noopener noreferrer"&gt;Nessie&lt;/a&gt; catalog, the shape is the same; only the &lt;code&gt;rest_uri&lt;/code&gt; and &lt;code&gt;rest_warehouse&lt;/code&gt; change. For AWS Glue, set &lt;code&gt;catalog_type: glue&lt;/code&gt; and &lt;code&gt;glue_warehouse: s3://my-bucket/warehouse&lt;/code&gt;. The &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connection docs&lt;/a&gt; walk through each catalog type.&lt;/p&gt;

&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;ICEBERG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the Postgres source has three tables in a &lt;code&gt;demo_postgres_iceberg&lt;/code&gt; schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;users&lt;/code&gt; — 8,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 35,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object:&lt;/code&gt; follows the usual &lt;code&gt;&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt; shape. Sling creates the Iceberg namespace if it doesn't already exist in the catalog.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name so you don't repeat yourself.&lt;/li&gt;
&lt;li&gt;The third stream switches to &lt;code&gt;mode: incremental&lt;/code&gt; with an &lt;code&gt;update_key&lt;/code&gt;. That's the only diff between a one-shot bulk load and an ongoing append flow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling CLI | https://slingdata.io
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
INF Sling Replication [3 streams] | POSTGRES -&amp;gt; ICEBERG

INF [1 / 3] running stream demo_postgres_iceberg.users
INF created table "demo_postgres_iceberg"."users"
INF streaming data (direct insert)
INF inserted 8000 rows into "demo_postgres_iceberg"."users" in 11 secs [713 r/s] [519 kB]

INF [2 / 3] running stream demo_postgres_iceberg.orders
INF created table "demo_postgres_iceberg"."orders"
INF inserted 35000 rows into "demo_postgres_iceberg"."orders" in 9 secs [3,721 r/s] [2.1 MB]

INF [3 / 3] running stream demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF created table "demo_postgres_iceberg"."events"
INF inserted 60000 rows into "demo_postgres_iceberg"."events" in 7 secs [8,190 r/s] [4.5 MB]

INF Sling Replication Completed in 29s | POSTGRES -&amp;gt; ICEBERG | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;103,000 rows across three tables, 29 seconds end-to-end. The warning at the top deserves a real answer; see the section on incremental modes further down.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;Sling can query Iceberg tables directly through its DuckDB-backed reader. Tables are addressed as &lt;code&gt;iceberg_catalog.&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select 'users' as t, count(*) as c
     from iceberg_catalog.demo_postgres_iceberg.users
   union all
   select 'orders', count(*) from iceberg_catalog.demo_postgres_iceberg.orders
   union all
   select 'events', count(*) from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      |     C |
+--------+-------+
| users  |  8000 |
| orders | 35000 |
| events | 60000 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Row counts match the source. A sample of &lt;code&gt;users&lt;/code&gt; confirms columns and types survived the trip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select user_id, email, country, signup_at
     from iceberg_catalog.demo_postgres_iceberg.users
    order by user_id limit 5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------+-------------------+---------+-------------------------------+
| USER_ID | EMAIL             | COUNTRY | SIGNUP_AT                     |
+---------+-------------------+---------+-------------------------------+
|       1 | user1@example.com | BR      | 2025-01-01 00:14:00 -0300 -03 |
|       2 | user2@example.com | DE      | 2025-01-01 00:28:00 -0300 -03 |
|       3 | user3@example.com | FR      | 2025-01-01 00:42:00 -0300 -03 |
|       4 | user4@example.com | JP      | 2025-01-01 00:56:00 -0300 -03 |
|       5 | user5@example.com | UK      | 2025-01-01 01:10:00 -0300 -03 |
+---------+-------------------+---------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Postgres &lt;code&gt;jsonb&lt;/code&gt; lands as a structured column too. Sampling &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+---------+------------+----------------------+----------------------+
| EVENT_ID | USER_ID | EVENT_TYPE | PAYLOAD              | OCCURRED_AT          |
+----------+---------+------------+----------------------+----------------------+
|    60001 |       2 | click      | {"v": 1, "utm": "x"} | 2026-05-11 ...       |
|    60002 |       3 | signup     | {"v": 2, "utm": "x"} | 2026-05-11 ...       |
|    60003 |       4 | purchase   | {"v": 3, "utm": "x"} | 2026-05-11 ...       |
+----------+---------+------------+----------------------+----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any other Iceberg reader sees the same data: DuckDB with the &lt;code&gt;iceberg&lt;/code&gt; extension, Spark, Trino, Athena, Snowflake's catalog-linked databases. That portability is the reason for the catalog in the first place.&lt;/p&gt;

&lt;h1&gt;
  
  
  Running an Incremental Append
&lt;/h1&gt;

&lt;p&gt;After the bulk load, the day-to-day shape is: every few minutes (or hours, or once a day), pick up the new rows since the last run and append them to the Iceberg table. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; does this. The state (the last seen value of the &lt;code&gt;update_key&lt;/code&gt;) is tracked by Sling itself, so you don't need to manage a state file the way you would for a file-based target.&lt;/p&gt;

&lt;p&gt;Insert 2,500 new events on the source (a stand-in for fresh activity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_postgres_iceberg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'click'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'utm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'v'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 second'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run a single-stream replication that touches only &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication | POSTGRES -&amp;gt; ICEBERG | demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF streaming data (direct insert)
INF inserted 2500 rows into "demo_postgres_iceberg"."events" in 8 secs [294 r/s] [178 kB]
INF execution succeeded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sling read the saved checkpoint, pulled only rows newer than the last &lt;code&gt;occurred_at&lt;/code&gt; it saw, and appended exactly the 2,500 new rows. A readback confirms the new total:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select min(occurred_at), max(occurred_at), count(*)
     from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------------------+--------------------------------------+--------+
| MIN_OCCURRED_AT               | MAX_OCCURRED_AT                      | COUNT  |
+-------------------------------+--------------------------------------+--------+
| 2025-03-01 00:00:40 -0300 -03 | 2026-05-11 08:42:59.533692 -0300 -03 |  62500 |
+-------------------------------+--------------------------------------+--------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;60,000 + 2,500 = 62,500. The new high-water mark on &lt;code&gt;occurred_at&lt;/code&gt; is the timestamp of the freshest insert. The next scheduled run will start from there.&lt;/p&gt;

&lt;h1&gt;
  
  
  Append-incremental vs merge-incremental
&lt;/h1&gt;

&lt;p&gt;That warning Sling printed on the first run matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For database targets like Postgres or Snowflake, Sling's &lt;code&gt;incremental&lt;/code&gt; mode is a merge: a row whose &lt;code&gt;primary_key&lt;/code&gt; already exists in the target gets updated in place. For an Iceberg target today, &lt;code&gt;incremental&lt;/code&gt; means append only. New rows go in, existing rows stay as-is, and a &lt;code&gt;primary_key&lt;/code&gt; declared on the stream is parsed but not enforced.&lt;/p&gt;

&lt;p&gt;That is fine when your source is append-only: events, immutable transactions, log data. It is the wrong default if your source has mutable rows you need reflected on the lake side. Until merge lands, two patterns work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Snapshot replays. Run &lt;code&gt;mode: full-refresh&lt;/code&gt; on a cadence that matches your freshness budget. Iceberg's snapshot model means readers always see a consistent table; the old snapshot is replaced atomically. For tables in the low millions this is faster than it sounds.&lt;/li&gt;
&lt;li&gt;CDC-style append plus downstream resolution. Append every Postgres change to Iceberg as-is (using a logical-replication tool or trigger-based capture) and resolve the latest-state view at read time with something like &lt;code&gt;qualify row_number() over (partition by pk order by event_ts desc) = 1&lt;/code&gt;. A bit more work at query time, very cheap at write time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Track the &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connector docs&lt;/a&gt; for when full merge mode ships.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common tweaks
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose the right catalog.&lt;/strong&gt; REST is the most portable: the same connection shape works for Cloudflare R2, Lakekeeper, Nessie, Polaris, and any other REST-compatible catalog. Glue is the simplest in AWS-native shops. SQL catalog is fine for local dev. Avoid wiring a different catalog per environment if you can help it; the table layout doesn't care, but the metadata location does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Namespace organization.&lt;/strong&gt; Treat namespaces (&lt;code&gt;demo_postgres_iceberg.users&lt;/code&gt;) the way you treat warehouse schemas: one per source system, or one per data domain. Don't dump everything into &lt;code&gt;default&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a &lt;code&gt;sql:&lt;/code&gt; block per stream to project columns or filter rows before they leave Postgres. Smaller Parquet files, smaller manifests, cheaper queries downstream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time travel for free.&lt;/strong&gt; Every replication produces a new Iceberg snapshot. Readers can time-travel to a previous snapshot, which is useful for "what did this table look like before yesterday's run?" without storing your own backups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain the table.&lt;/strong&gt; Like any Iceberg table, periodic compaction and snapshot expiration keep the file count and metadata size from growing without bound. Set this up on a separate schedule from the replication itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to go next
&lt;/h1&gt;

&lt;p&gt;The same pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into Iceberg: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.&lt;/p&gt;

&lt;p&gt;If the underlying R2 storage is what brought you here, the &lt;a href="https://slingdata.io/articles/r2-from-postgres-parquet-sling/" rel="noopener noreferrer"&gt;Postgres → R2 as Parquet&lt;/a&gt; walkthrough shows the same source landing as raw Parquet files instead of an Iceberg table, which is useful when downstream readers don't need a catalog. For a deeper comparison of file-format targets, see &lt;a href="https://slingdata.io/articles/postgres-to-s3-parquet-with-sling/" rel="noopener noreferrer"&gt;Postgres → S3 as Parquet&lt;/a&gt; and &lt;a href="https://slingdata.io/articles/postgres-to-duckdb-with-sling/" rel="noopener noreferrer"&gt;Postgres → DuckDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>iceberg</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
  </channel>
</rss>
