<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fritz Larco</title>
    <description>The latest articles on DEV Community by Fritz Larco (@flarco).</description>
    <link>https://dev.to/flarco</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3283560%2F494dc7b5-96e2-48f0-958f-a9ebf6523eef.jpeg</url>
      <title>DEV Community: Fritz Larco</title>
      <link>https://dev.to/flarco</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flarco"/>
    <language>en</language>
    <item>
      <title>Load PostgreSQL into Apache Iceberg with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 18 May 2026 13:31:11 +0000</pubDate>
      <link>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</link>
      <guid>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; is the table format that turns a pile of Parquet files in object storage into something that behaves like a warehouse table. You get schema evolution, hidden partitioning, time travel, and consistent reads from whichever engine you point at the table. PostgreSQL is where most operational data starts. Moving it into Iceberg gives you an analytics copy that DuckDB, Spark, Trino, Snowflake, and Athena can all read without anyone needing to agree on a single warehouse vendor first.&lt;/p&gt;

&lt;p&gt;Sling speaks the Iceberg &lt;a href="https://iceberg.apache.org/spec/#rest-catalog-spec" rel="noopener noreferrer"&gt;REST catalog&lt;/a&gt; directly. From the configuration side an Iceberg target is just another database connection: point Sling at the catalog URL and the underlying object store, then declare your streams. No JVM, no Spark, no manual manifest writing.&lt;/p&gt;

&lt;p&gt;This guide replicates a Postgres schema into Iceberg using &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt;. The catalog is &lt;a href="https://developers.cloudflare.com/r2/data-catalog/" rel="noopener noreferrer"&gt;Cloudflare R2's managed Iceberg REST catalog&lt;/a&gt; and the storage layer underneath is R2. Every CLI line, row count, and timing below comes from an actual run against those endpoints.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install fits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full install notes are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Postgres Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. A read-only user is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;mydb&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then register the connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;POSTGRES &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sling &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;POSTGRES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sling&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Postgres requires SSL, append &lt;code&gt;sslmode: require&lt;/code&gt;. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;POSTGRES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;Postgres connection docs&lt;/a&gt; cover SSL, IAM, and the rest.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Iceberg Target
&lt;/h1&gt;

&lt;p&gt;Sling treats Iceberg as a database-class target. The connection captures two things: the catalog, which stores table metadata, and the warehouse, which stores the actual Parquet data files. Sling supports REST, AWS Glue, and SQL catalogs. This guide uses REST.&lt;/p&gt;

&lt;p&gt;For Cloudflare R2's Iceberg catalog you need the catalog URL, an API token, the warehouse identifier (account-id + bucket name), and S3-compatible credentials for the R2 bucket underneath. All four come from the R2 dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ICEBERG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iceberg&lt;/span&gt;
    &lt;span class="na"&gt;catalog_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rest&lt;/span&gt;
    &lt;span class="na"&gt;rest_uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://catalog.cloudflarestorage.com/&amp;lt;accountid&amp;gt;/&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_catalog_api_token&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_warehouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;accountid&amp;gt;_&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_access_key_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_access_key_id&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_secret_access_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_secret_access_key&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a self-hosted &lt;a href="https://github.com/lakekeeper/lakekeeper" rel="noopener noreferrer"&gt;Lakekeeper&lt;/a&gt; or &lt;a href="https://projectnessie.org/" rel="noopener noreferrer"&gt;Nessie&lt;/a&gt; catalog, the shape is the same; only the &lt;code&gt;rest_uri&lt;/code&gt; and &lt;code&gt;rest_warehouse&lt;/code&gt; change. For AWS Glue, set &lt;code&gt;catalog_type: glue&lt;/code&gt; and &lt;code&gt;glue_warehouse: s3://my-bucket/warehouse&lt;/code&gt;. The &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connection docs&lt;/a&gt; walk through each catalog type.&lt;/p&gt;

&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;ICEBERG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the Postgres source has three tables in a &lt;code&gt;demo_postgres_iceberg&lt;/code&gt; schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;users&lt;/code&gt; — 8,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 35,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object:&lt;/code&gt; follows the usual &lt;code&gt;&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt; shape. Sling creates the Iceberg namespace if it doesn't already exist in the catalog.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name so you don't repeat yourself.&lt;/li&gt;
&lt;li&gt;The third stream switches to &lt;code&gt;mode: incremental&lt;/code&gt; with an &lt;code&gt;update_key&lt;/code&gt;. That's the only diff between a one-shot bulk load and an ongoing append flow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling CLI | https://slingdata.io
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
INF Sling Replication [3 streams] | POSTGRES -&amp;gt; ICEBERG

INF [1 / 3] running stream demo_postgres_iceberg.users
INF created table "demo_postgres_iceberg"."users"
INF streaming data (direct insert)
INF inserted 8000 rows into "demo_postgres_iceberg"."users" in 11 secs [713 r/s] [519 kB]

INF [2 / 3] running stream demo_postgres_iceberg.orders
INF created table "demo_postgres_iceberg"."orders"
INF inserted 35000 rows into "demo_postgres_iceberg"."orders" in 9 secs [3,721 r/s] [2.1 MB]

INF [3 / 3] running stream demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF created table "demo_postgres_iceberg"."events"
INF inserted 60000 rows into "demo_postgres_iceberg"."events" in 7 secs [8,190 r/s] [4.5 MB]

INF Sling Replication Completed in 29s | POSTGRES -&amp;gt; ICEBERG | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;103,000 rows across three tables, 29 seconds end-to-end. The warning at the top deserves a real answer; see the section on incremental modes further down.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;Sling can query Iceberg tables directly through its DuckDB-backed reader. Tables are addressed as &lt;code&gt;iceberg_catalog.&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select 'users' as t, count(*) as c
     from iceberg_catalog.demo_postgres_iceberg.users
   union all
   select 'orders', count(*) from iceberg_catalog.demo_postgres_iceberg.orders
   union all
   select 'events', count(*) from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      |     C |
+--------+-------+
| users  |  8000 |
| orders | 35000 |
| events | 60000 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Row counts match the source. A sample of &lt;code&gt;users&lt;/code&gt; confirms columns and types survived the trip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select user_id, email, country, signup_at
     from iceberg_catalog.demo_postgres_iceberg.users
    order by user_id limit 5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------+-------------------+---------+-------------------------------+
| USER_ID | EMAIL             | COUNTRY | SIGNUP_AT                     |
+---------+-------------------+---------+-------------------------------+
|       1 | user1@example.com | BR      | 2025-01-01 00:14:00 -0300 -03 |
|       2 | user2@example.com | DE      | 2025-01-01 00:28:00 -0300 -03 |
|       3 | user3@example.com | FR      | 2025-01-01 00:42:00 -0300 -03 |
|       4 | user4@example.com | JP      | 2025-01-01 00:56:00 -0300 -03 |
|       5 | user5@example.com | UK      | 2025-01-01 01:10:00 -0300 -03 |
+---------+-------------------+---------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Postgres &lt;code&gt;jsonb&lt;/code&gt; lands as a structured column too. Sampling &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+---------+------------+----------------------+----------------------+
| EVENT_ID | USER_ID | EVENT_TYPE | PAYLOAD              | OCCURRED_AT          |
+----------+---------+------------+----------------------+----------------------+
|    60001 |       2 | click      | {"v": 1, "utm": "x"} | 2026-05-11 ...       |
|    60002 |       3 | signup     | {"v": 2, "utm": "x"} | 2026-05-11 ...       |
|    60003 |       4 | purchase   | {"v": 3, "utm": "x"} | 2026-05-11 ...       |
+----------+---------+------------+----------------------+----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any other Iceberg reader sees the same data: DuckDB with the &lt;code&gt;iceberg&lt;/code&gt; extension, Spark, Trino, Athena, Snowflake's catalog-linked databases. That portability is the reason for the catalog in the first place.&lt;/p&gt;

&lt;h1&gt;
  
  
  Running an Incremental Append
&lt;/h1&gt;

&lt;p&gt;After the bulk load, the day-to-day shape is: every few minutes (or hours, or once a day), pick up the new rows since the last run and append them to the Iceberg table. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; does this. The state (the last seen value of the &lt;code&gt;update_key&lt;/code&gt;) is tracked by Sling itself, so you don't need to manage a state file the way you would for a file-based target.&lt;/p&gt;

&lt;p&gt;Insert 2,500 new events on the source (a stand-in for fresh activity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_postgres_iceberg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'click'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'utm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'v'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 second'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run a single-stream replication that touches only &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication | POSTGRES -&amp;gt; ICEBERG | demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF streaming data (direct insert)
INF inserted 2500 rows into "demo_postgres_iceberg"."events" in 8 secs [294 r/s] [178 kB]
INF execution succeeded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sling read the saved checkpoint, pulled only rows newer than the last &lt;code&gt;occurred_at&lt;/code&gt; it saw, and appended exactly the 2,500 new rows. A readback confirms the new total:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select min(occurred_at), max(occurred_at), count(*)
     from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------------------+--------------------------------------+--------+
| MIN_OCCURRED_AT               | MAX_OCCURRED_AT                      | COUNT  |
+-------------------------------+--------------------------------------+--------+
| 2025-03-01 00:00:40 -0300 -03 | 2026-05-11 08:42:59.533692 -0300 -03 |  62500 |
+-------------------------------+--------------------------------------+--------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;60,000 + 2,500 = 62,500. The new high-water mark on &lt;code&gt;occurred_at&lt;/code&gt; is the timestamp of the freshest insert. The next scheduled run will start from there.&lt;/p&gt;

&lt;h1&gt;
  
  
  Append-incremental vs merge-incremental
&lt;/h1&gt;

&lt;p&gt;That warning Sling printed on the first run matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For database targets like Postgres or Snowflake, Sling's &lt;code&gt;incremental&lt;/code&gt; mode is a merge: a row whose &lt;code&gt;primary_key&lt;/code&gt; already exists in the target gets updated in place. For an Iceberg target today, &lt;code&gt;incremental&lt;/code&gt; means append only. New rows go in, existing rows stay as-is, and a &lt;code&gt;primary_key&lt;/code&gt; declared on the stream is parsed but not enforced.&lt;/p&gt;

&lt;p&gt;That is fine when your source is append-only: events, immutable transactions, log data. It is the wrong default if your source has mutable rows you need reflected on the lake side. Until merge lands, two patterns work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Snapshot replays. Run &lt;code&gt;mode: full-refresh&lt;/code&gt; on a cadence that matches your freshness budget. Iceberg's snapshot model means readers always see a consistent table; the old snapshot is replaced atomically. For tables in the low millions this is faster than it sounds.&lt;/li&gt;
&lt;li&gt;CDC-style append plus downstream resolution. Append every Postgres change to Iceberg as-is (using a logical-replication tool or trigger-based capture) and resolve the latest-state view at read time with something like &lt;code&gt;qualify row_number() over (partition by pk order by event_ts desc) = 1&lt;/code&gt;. A bit more work at query time, very cheap at write time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Track the &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connector docs&lt;/a&gt; for when full merge mode ships.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common tweaks
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose the right catalog.&lt;/strong&gt; REST is the most portable: the same connection shape works for Cloudflare R2, Lakekeeper, Nessie, Polaris, and any other REST-compatible catalog. Glue is the simplest in AWS-native shops. SQL catalog is fine for local dev. Avoid wiring a different catalog per environment if you can help it; the table layout doesn't care, but the metadata location does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Namespace organization.&lt;/strong&gt; Treat namespaces (&lt;code&gt;demo_postgres_iceberg.users&lt;/code&gt;) the way you treat warehouse schemas: one per source system, or one per data domain. Don't dump everything into &lt;code&gt;default&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a &lt;code&gt;sql:&lt;/code&gt; block per stream to project columns or filter rows before they leave Postgres. Smaller Parquet files, smaller manifests, cheaper queries downstream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time travel for free.&lt;/strong&gt; Every replication produces a new Iceberg snapshot. Readers can time-travel to a previous snapshot, which is useful for "what did this table look like before yesterday's run?" without storing your own backups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain the table.&lt;/strong&gt; Like any Iceberg table, periodic compaction and snapshot expiration keep the file count and metadata size from growing without bound. Set this up on a separate schedule from the replication itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to go next
&lt;/h1&gt;

&lt;p&gt;The same pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into Iceberg: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.&lt;/p&gt;

&lt;p&gt;If the underlying R2 storage is what brought you here, the &lt;a href="https://slingdata.io/articles/r2-from-postgres-parquet-sling/" rel="noopener noreferrer"&gt;Postgres → R2 as Parquet&lt;/a&gt; walkthrough shows the same source landing as raw Parquet files instead of an Iceberg table, which is useful when downstream readers don't need a catalog. For a deeper comparison of file-format targets, see &lt;a href="https://slingdata.io/articles/postgres-to-s3-parquet-with-sling/" rel="noopener noreferrer"&gt;Postgres → S3 as Parquet&lt;/a&gt; and &lt;a href="https://slingdata.io/articles/postgres-to-duckdb-with-sling/" rel="noopener noreferrer"&gt;Postgres → DuckDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>iceberg</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
  </channel>
</rss>
