<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fritz Larco</title>
    <description>The latest articles on DEV Community by Fritz Larco (@flarco).</description>
    <link>https://dev.to/flarco</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3283560%2F494dc7b5-96e2-48f0-958f-a9ebf6523eef.jpeg</url>
      <title>DEV Community: Fritz Larco</title>
      <link>https://dev.to/flarco</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flarco"/>
    <language>en</language>
    <item>
      <title>Exporting Snowflake to BigQuery Using Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Fri, 26 Jun 2026 00:49:25 +0000</pubDate>
      <link>https://dev.to/flarco/exporting-snowflake-to-bigquery-using-sling-4ek2</link>
      <guid>https://dev.to/flarco/exporting-snowflake-to-bigquery-using-sling-4ek2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Last updated: May 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  The Challenge of Snowflake to BigQuery Data Migration
&lt;/h1&gt;

&lt;p&gt;Moving data between cloud data warehouses like Snowflake and BigQuery traditionally involves complex ETL processes, custom scripts, and significant engineering effort. Common challenges include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setting up and maintaining data extraction processes from Snowflake&lt;/li&gt;
&lt;li&gt;Managing data type compatibility between platforms&lt;/li&gt;
&lt;li&gt;Implementing efficient data loading into BigQuery&lt;/li&gt;
&lt;li&gt;Monitoring and maintaining the data pipeline&lt;/li&gt;
&lt;li&gt;Handling incremental updates and schema changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sling simplifies this entire process by providing a streamlined, configuration-based approach that eliminates the need for custom code and complex infrastructure setup.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Getting started with Sling is straightforward. You can install the CLI tool using various package managers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more detailed installation instructions, visit the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started#installation" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setting Up Connections
&lt;/h1&gt;

&lt;p&gt;Before we can start replicating data, we need to configure our Snowflake and BigQuery connections. Sling makes this process simple with its connection management system.&lt;/p&gt;

&lt;p&gt;First, let's set up our &lt;a href="https://docs.slingdata.io/connections/database-connections/snowflake" rel="noopener noreferrer"&gt;Snowflake connection&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up Snowflake connection&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_SOURCE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"snowflake://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_USER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_PASSWORD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_ACCOUNT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_DATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?warehouse=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_WAREHOUSE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;role=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SNOWFLAKE_ROLE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# we should be able to test our connection now&lt;/span&gt;
sling conns &lt;span class="nb"&gt;test &lt;/span&gt;snowflake_source
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's configure the &lt;a href="https://docs.slingdata.io/connections/database-connections/bigquery" rel="noopener noreferrer"&gt;BigQuery connection&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up BigQuery connection&lt;/span&gt;
sling conns &lt;span class="nb"&gt;set &lt;/span&gt;bigquery_target &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bigquery &lt;span class="nv"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;project&amp;gt; &lt;span class="nv"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;dataset&amp;gt; &lt;span class="nv"&gt;key_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/path/to/service.account.json

&lt;span class="c"&gt;# we should be able to test our connection now&lt;/span&gt;
sling conns &lt;span class="nb"&gt;test &lt;/span&gt;bigquery_target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Creating a Snowflake to BigQuery Replication
&lt;/h1&gt;

&lt;p&gt;Now that our connections are set up, we can create a replication configuration. Create a file named &lt;code&gt;snowflake_to_bigquery.yaml&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define source and target connections&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;snowflake_source&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bigquery_target&lt;/span&gt;

&lt;span class="c1"&gt;# Set default options for all streams&lt;/span&gt;
&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;

&lt;span class="c1"&gt;# Define the tables to replicate&lt;/span&gt;
&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Replicate a single table&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SALES.ORDERS"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales_dataset.orders"&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# Replicate multiple tables using wildcards&lt;/span&gt;
&lt;span class="err"&gt;  &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SALES.*"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales_dataset.{stream_table}"&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_modified_at"&lt;/span&gt;
    &lt;span class="na"&gt;target_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Use BigQuery's bulk loading for better performance&lt;/span&gt;
      &lt;span class="na"&gt;use_bulk&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more detailed configuration options, refer to the &lt;a href="https://docs.slingdata.io/concepts/replication" rel="noopener noreferrer"&gt;replication documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Running the Replication
&lt;/h1&gt;

&lt;p&gt;With our configuration in place, we can now run the replication using the Sling CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run the replication&lt;/span&gt;
sling run &lt;span class="nt"&gt;-r&lt;/span&gt; snowflake_to_bigquery.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  The Sling Platform
&lt;/h1&gt;

&lt;p&gt;While the CLI provides powerful functionality for data replication, the Sling Platform offers a comprehensive UI-based solution for managing your data pipelines at scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fslingdata.io%2Fassets%2Fimages%2Fscreenshots%2Fui.editor.light.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fslingdata.io%2Fassets%2Fimages%2Fscreenshots%2Fui.editor.light.png" alt="Sling Platform Editor" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The platform provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual replication configuration&lt;/li&gt;
&lt;li&gt;Real-time monitoring and logging&lt;/li&gt;
&lt;li&gt;Team collaboration features&lt;/li&gt;
&lt;li&gt;Scheduled executions&lt;/li&gt;
&lt;li&gt;Agent management for distributed workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Best Practices and Tips
&lt;/h1&gt;

&lt;p&gt;To get the most out of your Snowflake to BigQuery replications:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use incremental mode for large tables that update frequently&lt;/li&gt;
&lt;li&gt;Implement appropriate primary keys for data integrity&lt;/li&gt;
&lt;li&gt;Leverage bulk loading for better performance&lt;/li&gt;
&lt;li&gt;Monitor replication logs regularly&lt;/li&gt;
&lt;li&gt;Use runtime variables for flexible configurations&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Next Steps
&lt;/h1&gt;

&lt;p&gt;To learn more about Sling's capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore &lt;a href="https://docs.slingdata.io/examples/database-to-database" rel="noopener noreferrer"&gt;database-to-database examples&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Read about &lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;replication modes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn about &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variables&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Guides
&lt;/h2&gt;

&lt;p&gt;For more Snowflake and BigQuery workflows, these articles cover related paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://slingdata.io/articles/exporting-sqlserver-to-bigquery-using-sling" rel="noopener noreferrer"&gt;SQL Server to BigQuery with Sling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slingdata.io/articles/export-postgres-to-bigquery-database" rel="noopener noreferrer"&gt;Postgres to BigQuery with Sling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slingdata.io/articles/mysql-bigquery-sling-data-transfer" rel="noopener noreferrer"&gt;MySQL to BigQuery with Sling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slingdata.io/articles/bigquery-to-snowflake-sling" rel="noopener noreferrer"&gt;BigQuery to Snowflake with Sling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slingdata.io/articles/sling-snowflake-to-postgres" rel="noopener noreferrer"&gt;Snowflake to Postgres with Sling&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Frequently Asked Questions
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Does Sling pull data from Snowflake using &lt;code&gt;UNLOAD&lt;/code&gt; to a stage, or does it stream rows over the wire?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sling streams rows over the standard Snowflake driver and buffers them in batches before pushing to BigQuery. There's no Snowflake stage or external table involved, which makes the setup simpler but means very large tables benefit from running on hardware close to the BigQuery region.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Sling map Snowflake's &lt;code&gt;VARIANT&lt;/code&gt; and &lt;code&gt;OBJECT&lt;/code&gt; columns to BigQuery?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Variant, object, and array columns are serialized to JSON strings during extraction and land in BigQuery as &lt;code&gt;STRING&lt;/code&gt; by default. If you want them as &lt;code&gt;JSON&lt;/code&gt; in BigQuery, run a post-load SQL step that casts the column with &lt;code&gt;SAFE.PARSE_JSON()&lt;/code&gt; into a new column or view.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I replicate a Snowflake share without copying data into my own database first?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. As long as the Snowflake role on the connection has &lt;code&gt;IMPORTED PRIVILEGES&lt;/code&gt; on the share, you can address the shared database and schema directly in your stream names. Sling reads from the share the same way it reads from any other database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the right approach for handling case-sensitive Snowflake identifiers?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Snowflake stores unquoted identifiers in uppercase. Sling preserves the source casing by default, so streams like &lt;code&gt;SALES.ORDERS&lt;/code&gt; keep their uppercase form. Set &lt;code&gt;target_options.column_casing: snake&lt;/code&gt; if you want lower_snake_case columns on the BigQuery side, which is the BigQuery convention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will Sling create the target dataset in BigQuery automatically?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sling will create the target tables, but the dataset itself must exist before the run starts. This is by design because dataset creation involves location and billing decisions that Sling shouldn't make for you. Create the dataset once, then point your replications at it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can I throttle the load on Snowflake during a large initial backfill?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;source_options.batch_limit&lt;/code&gt; to cap rows per batch and run streams sequentially by leaving the default parallelism. You can also point the replication at a smaller Snowflake warehouse so it auto-suspends quickly if the run pauses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the &lt;code&gt;use_bulk: true&lt;/code&gt; option actually change anything for BigQuery targets?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BigQuery loads are already done via the bulk load API by default, so &lt;code&gt;use_bulk: true&lt;/code&gt; is effectively a no-op for this target. You can safely omit it in BigQuery replications; it's only meaningful for targets that have both row-by-row and bulk paths.&lt;/p&gt;

</description>
      <category>snowflake</category>
      <category>bigquery</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Effortless Data Migration: How to Export from PostgreSQL and Load into S3 as Parquet with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Wed, 17 Jun 2026 16:45:36 +0000</pubDate>
      <link>https://dev.to/flarco/effortless-data-migration-how-to-export-from-postgresql-and-load-into-s3-as-parquet-with-sling-3mh</link>
      <guid>https://dev.to/flarco/effortless-data-migration-how-to-export-from-postgresql-and-load-into-s3-as-parquet-with-sling-3mh</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Last updated: June 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In today's data-driven landscape, efficiently moving data from PostgreSQL databases to cloud storage solutions like Amazon S3 is a critical requirement for many organizations. When combined with the Parquet file format's superior compression and query performance capabilities, this creates a powerful solution for data warehousing and analytics. However, setting up and maintaining such a data pipeline traditionally involves multiple tools, complex configurations, and significant overhead.&lt;/p&gt;

&lt;p&gt;Enter Sling, a modern data movement tool that dramatically simplifies this process. In this guide, we'll explore how to use Sling to efficiently transfer data from PostgreSQL to S3, storing it in the Parquet format for optimal performance and cost efficiency. We'll cover everything from installation and setup to advanced configuration options, making your data pipeline both powerful and maintainable.&lt;/p&gt;

&lt;h1&gt;
  
  
  Sling: A Modern Solution
&lt;/h1&gt;

&lt;p&gt;Sling is a modern data movement platform designed to simplify data operations between various sources and destinations. It provides both a powerful CLI tool and a comprehensive platform for managing data workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Data Transfer&lt;/strong&gt;: Optimized for performance with built-in parallelization and streaming capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native Parquet Support&lt;/strong&gt;: Direct conversion to Parquet format without intermediate steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema Handling&lt;/strong&gt;: Automatic schema detection and evolution support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Updates&lt;/strong&gt;: Built-in support for incremental data loading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Secure credential management for both PostgreSQL and S3&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Getting Started with Sling
&lt;/h1&gt;

&lt;p&gt;Let's begin by installing Sling on your system. Sling provides multiple installation methods to suit your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, verify that Sling is properly installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check Sling version&lt;/span&gt;
sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more detailed installation instructions, visit the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started#installation" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;. &lt;/p&gt;

&lt;h1&gt;
  
  
  Setting Up Connections
&lt;/h1&gt;

&lt;p&gt;Before we can transfer data, we need to configure our source (PostgreSQL) and target (S3) connections. Sling provides multiple ways to set up and manage connections securely.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostgreSQL Connection Setup
&lt;/h2&gt;

&lt;p&gt;You can set up a PostgreSQL connection using one of these methods:&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Environment Variables
&lt;/h3&gt;

&lt;p&gt;The simplest way is to use environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set PostgreSQL connection using environment variable&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;POSTGRES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'postgresql://myuser:mypassword@localhost:5432/mydatabase'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the Sling CLI
&lt;/h3&gt;

&lt;p&gt;Alternatively, use the &lt;code&gt;sling conns set&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up PostgreSQL connection with individual parameters&lt;/span&gt;
sling conns &lt;span class="nb"&gt;set &lt;/span&gt;POSTGRES &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myuser &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydatabase &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypassword &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432

&lt;span class="c"&gt;# Or use a connection URL&lt;/span&gt;
sling conns &lt;span class="nb"&gt;set &lt;/span&gt;POSTGRES &lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgresql://myuser:mypassword@localhost:5432/mydatabase"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the Sling Environment File
&lt;/h3&gt;

&lt;p&gt;You can also add the connection details to your &lt;code&gt;~/.sling/env.yaml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;POSTGRES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myuser&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypassword&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydatabase&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  S3 Connection Setup
&lt;/h2&gt;

&lt;p&gt;For Amazon S3, you'll need to configure AWS credentials. Here are the available methods:&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Environment Variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set AWS credentials using environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'your_access_key'&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'your_secret_key'&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'us-west-2'&lt;/span&gt;  &lt;span class="c"&gt;# optional, defaults to us-east-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the Sling CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up S3 connection with credentials&lt;/span&gt;
sling conns &lt;span class="nb"&gt;set &lt;/span&gt;S3 &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3 &lt;span class="nv"&gt;access_key_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_access_key &lt;span class="nv"&gt;secret_access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_secret_key &lt;span class="nv"&gt;region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-west-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the Sling Environment File
&lt;/h3&gt;

&lt;p&gt;Add the S3 connection to your &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;S3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3&lt;/span&gt;
    &lt;span class="na"&gt;access_key_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_access_key&lt;/span&gt;
    &lt;span class="na"&gt;secret_access_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_secret_key&lt;/span&gt;
    &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-west-2&lt;/span&gt;  &lt;span class="c1"&gt;# optional, defaults to us-east-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing Connections
&lt;/h2&gt;

&lt;p&gt;After setting up your connections, it's important to verify they work correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test the PostgreSQL connection&lt;/span&gt;
sling conns &lt;span class="nb"&gt;test &lt;/span&gt;POSTGRES

&lt;span class="c"&gt;# Test the S3 connection&lt;/span&gt;
sling conns &lt;span class="nb"&gt;test &lt;/span&gt;S3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also explore the PostgreSQL database schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List available tables in the public schema&lt;/span&gt;
sling conns discover POSTGRES &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s1"&gt;'public.*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more details about connection configuration and options, refer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;PostgreSQL Connection Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.slingdata.io/connections/file-connections/s3" rel="noopener noreferrer"&gt;S3 Connection Documentation&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Basic Data Transfer with CLI Flags
&lt;/h1&gt;

&lt;p&gt;Once you have your connections set up, you can start transferring data from PostgreSQL to S3 using Sling's CLI flags. Let's look at some common usage patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple Transfer Example
&lt;/h2&gt;

&lt;p&gt;The most basic way to transfer data is using the &lt;code&gt;sling run&lt;/code&gt; command with source and target specifications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export a single table to S3 as Parquet&lt;/span&gt;
sling run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-conn&lt;/span&gt; POSTGRES &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-stream&lt;/span&gt; &lt;span class="s2"&gt;"public.users"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; S3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; &lt;span class="s2"&gt;"s3://my-bucket/data/users.parquet"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding CLI Flag Options
&lt;/h2&gt;

&lt;p&gt;Sling provides various CLI flags to customize your transfer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export with specific columns and where clause&lt;/span&gt;
sling run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-conn&lt;/span&gt; POSTGRES &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-stream&lt;/span&gt; &lt;span class="s2"&gt;"SELECT id, name, email FROM users WHERE created_at &amp;gt; '2024-01-01'"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; S3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; &lt;span class="s2"&gt;"s3://my-bucket/data/filtered_users.parquet"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-options&lt;/span&gt; &lt;span class="s1"&gt;'{ "compression": "snappy", "row_group_size": 100000 }'&lt;/span&gt;

&lt;span class="c"&gt;# Export with custom Parquet options and table keys&lt;/span&gt;
sling run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-conn&lt;/span&gt; POSTGRES &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-stream&lt;/span&gt; &lt;span class="s2"&gt;"public.orders"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; S3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; &lt;span class="s2"&gt;"s3://my-bucket/data/orders.parquet"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-options&lt;/span&gt; &lt;span class="s1"&gt;'{ "file_max_bytes": 100000000, "compression": "snappy" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Runtime Variables
&lt;/h2&gt;

&lt;p&gt;Sling supports runtime variables that can be used in your object paths and queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export multiple tables with runtime variables&lt;/span&gt;
sling run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-conn&lt;/span&gt; POSTGRES &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--src-stream&lt;/span&gt; &lt;span class="s2"&gt;"public.sales_*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; S3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; &lt;span class="s2"&gt;"s3://my-bucket/data/{stream_table}/{date_yyyy_mm_dd}.parquet"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tgt-options&lt;/span&gt; &lt;span class="s1"&gt;'{ "file_max_bytes": 100000000 }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a complete list of available CLI flags and runtime variables, refer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/sling-cli/run#cli-flags-overview" rel="noopener noreferrer"&gt;CLI Flags Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;Runtime Variables Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Advanced Data Transfer with Replication YAML
&lt;/h1&gt;

&lt;p&gt;While CLI flags are great for simple transfers, YAML configuration files provide more flexibility and reusability for complex data transfer scenarios. Let's explore how to use YAML configurations with Sling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Multi-Stream Example
&lt;/h2&gt;

&lt;p&gt;Create a file named &lt;code&gt;postgres_to_s3.yaml&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Basic configuration for exporting multiple tables&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;S3&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;target_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parquet&lt;/span&gt;
    &lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;snappy&lt;/span&gt;
    &lt;span class="na"&gt;file_max_bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100000000&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Export users table with specific columns&lt;/span&gt;
  &lt;span class="na"&gt;public.users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://my-bucket/data/users/{YYYY}_{MM}_{DD}.parquet&lt;/span&gt;
    &lt;span class="na"&gt;select&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;created_at&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# Export orders table with primary key and column selection&lt;/span&gt;
  &lt;span class="na"&gt;public.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://my-bucket/data/orders/{YYYY}_{MM}_{DD}.parquet&lt;/span&gt;
    &lt;span class="na"&gt;target_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parquet&lt;/span&gt;
      &lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gzip&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Configuration Example
&lt;/h2&gt;

&lt;p&gt;Here's a more complex example with multiple streams and advanced options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;S3&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;source_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;add_new_columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;target_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parquet&lt;/span&gt;
    &lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;snappy&lt;/span&gt;
    &lt;span class="na"&gt;row_group_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100000&lt;/span&gt;
    &lt;span class="na"&gt;file_max_bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100000000&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Export all tables in sales schema&lt;/span&gt;
  &lt;span class="na"&gt;sales.*&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://my-bucket/data/{stream_schema}/{stream_table}.parquet&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
    &lt;span class="na"&gt;target_options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parquet&lt;/span&gt;
      &lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;snappy&lt;/span&gt;
      &lt;span class="na"&gt;file_max_bytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500000000&lt;/span&gt;

  &lt;span class="c1"&gt;# Incremental export of customer transactions (partitioning)&lt;/span&gt;
  &lt;span class="na"&gt;public.transactions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://my-bucket/data/transactions/{part_year}/{part_month}&lt;/span&gt;
    &lt;span class="na"&gt;sql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;select transaction_id, customer_id, amount, status, created_at&lt;/span&gt;
      &lt;span class="s"&gt;from public.transactions&lt;/span&gt;
      &lt;span class="s"&gt;where created_at &amp;gt; coalesce({incremental_val}, '2001-01-01)&lt;/span&gt;

  &lt;span class="c1"&gt;# Export specific customer data with custom query&lt;/span&gt;
  &lt;span class="na"&gt;public.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://my-bucket/data/customers.parquet&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;SELECT &lt;/span&gt;
        &lt;span class="s"&gt;c.customer_id,&lt;/span&gt;
        &lt;span class="s"&gt;c.name,&lt;/span&gt;
        &lt;span class="s"&gt;c.email,&lt;/span&gt;
        &lt;span class="s"&gt;COUNT(o.order_id) as total_orders,&lt;/span&gt;
        &lt;span class="s"&gt;SUM(o.total_amount) as lifetime_value&lt;/span&gt;
      &lt;span class="s"&gt;FROM customers c&lt;/span&gt;
      &lt;span class="s"&gt;LEFT JOIN orders o ON c.customer_id = o.customer_id&lt;/span&gt;
      &lt;span class="s"&gt;GROUP BY c.customer_id, c.name, c.email&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run a replication configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Execute the replication configuration&lt;/span&gt;
sling run &lt;span class="nt"&gt;-r&lt;/span&gt; postgres_to_s3.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more details about replication configuration options, refer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication" rel="noopener noreferrer"&gt;Replication Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication/source-options" rel="noopener noreferrer"&gt;Source Options Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.slingdata.io/concepts/replication/target-options" rel="noopener noreferrer"&gt;Target Options Documentation&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Using the Sling Platform UI
&lt;/h1&gt;

&lt;p&gt;While the CLI is powerful for automation and scripting, the Sling Platform provides a user-friendly web interface for managing and monitoring your data transfers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Platform Features
&lt;/h2&gt;

&lt;p&gt;The Sling Platform offers several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual Replication Editor&lt;/strong&gt;: Create and edit replication configurations with a user-friendly interface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Monitoring&lt;/strong&gt;: Track the progress of your data transfers in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;History and Logs&lt;/strong&gt;: View detailed execution history and logs for troubleshooting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team Collaboration&lt;/strong&gt;: Share connections and configurations with team members&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduling&lt;/strong&gt;: Set up recurring transfers with flexible scheduling options&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started with the Platform
&lt;/h2&gt;

&lt;p&gt;To get started with the Sling Platform:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Visit &lt;a href="https://app.slingdata.io" rel="noopener noreferrer"&gt;app.slingdata.io&lt;/a&gt; to create an account&lt;/li&gt;
&lt;li&gt;Follow the onboarding process to set up your workspace&lt;/li&gt;
&lt;li&gt;Create your PostgreSQL and S3 connections&lt;/li&gt;
&lt;li&gt;Create your first replication using the visual editor&lt;/li&gt;
&lt;li&gt;Monitor your transfers in real-time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For more information about the Sling Platform, visit the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Platform Documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Getting Started and Next Steps
&lt;/h1&gt;

&lt;p&gt;Now that you understand how to use Sling for transferring data from PostgreSQL to S3 in Parquet format, here are some next steps to explore:&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/examples/database-to-file" rel="noopener noreferrer"&gt;Database to File Examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;Replication Modes Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication/source-options" rel="noopener noreferrer"&gt;Source Options Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.slingdata.io/concepts/replication/target-options" rel="noopener noreferrer"&gt;Target Options Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start Small&lt;/strong&gt;: Begin with a single table and simple configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Thoroughly&lt;/strong&gt;: Use the &lt;code&gt;--dry-run&lt;/code&gt; flag to validate your configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Performance&lt;/strong&gt;: Use the platform's monitoring features to optimize your transfers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Version Control&lt;/strong&gt;: Store your replication YAML files in version control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Security&lt;/strong&gt;: Follow AWS best practices for S3 bucket policies and IAM roles&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Set up your first PostgreSQL to S3 transfer using the CLI&lt;/li&gt;
&lt;li&gt;Create a more complex replication using YAML configuration&lt;/li&gt;
&lt;li&gt;Explore the Sling Platform for visual configuration and monitoring&lt;/li&gt;
&lt;li&gt;Join the Sling community to share experiences and get help&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Sling, you can efficiently manage your data pipeline needs while maintaining flexibility and control over your data movement processes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Related Guides
&lt;/h1&gt;

&lt;p&gt;Parquet is the analytics-friendly choice, but Sling can write other formats from the same PostgreSQL source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/articles/postgres-to-s3-csv-export-with-sling"&gt;exporting PostgreSQL to S3 as CSV&lt;/a&gt; for a flat, widely-readable format&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/articles/postgres-to-s3-json-with-sling"&gt;exporting PostgreSQL to S3 as JSON&lt;/a&gt; when you need a nested, schema-flexible format&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/articles/postgres-to-parquet-with-sling"&gt;exporting PostgreSQL to local Parquet files&lt;/a&gt; when the target is a filesystem instead of S3&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/articles/export-mysql-to-s3-parquet-files"&gt;exporting MySQL to S3 as Parquet&lt;/a&gt; for the same workflow from a MySQL source&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  FAQ
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Why choose Parquet over CSV or JSON for PostgreSQL exports to S3?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Parquet is columnar and compressed, so files are smaller and analytical queries that read a subset of columns run much faster than over row-based CSV or JSON. It also carries column types, which avoids the type-guessing that text formats require downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which Parquet compression codecs does Sling support?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sling supports &lt;code&gt;snappy&lt;/code&gt;, &lt;code&gt;gzip&lt;/code&gt;, and &lt;code&gt;zstd&lt;/code&gt; among others, set via the &lt;code&gt;compression&lt;/code&gt; property under &lt;code&gt;target_options&lt;/code&gt;. Snappy is a good default balancing speed and size, while zstd compresses more tightly for cold storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is &lt;code&gt;row_group_size&lt;/code&gt; and how should I set it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;row_group_size&lt;/code&gt; controls how many rows go into each Parquet row group, which affects read parallelism and memory use. Larger groups compress better, while smaller groups let query engines skip data more granularly. The default works for most workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Sling preserve PostgreSQL data types in the Parquet schema?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Sling maps PostgreSQL types to Parquet logical types, so numerics, timestamps, and booleans stay typed instead of being coerced to strings the way they would in CSV.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I partition the Parquet output by date in S3?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use partition runtime variables such as &lt;code&gt;{part_year}&lt;/code&gt; and &lt;code&gt;{part_month}&lt;/code&gt; in the object path. Sling routes each row to the correct prefix, producing a Hive-style partitioned layout that query engines can prune.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I add new columns to existing exports without a full reload?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Enable &lt;code&gt;add_new_columns&lt;/code&gt; under &lt;code&gt;target_options&lt;/code&gt; so that when the source picks up a new column, Sling adds it to the schema on the next run rather than failing or requiring a manual reload.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How large should each Parquet file be?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Aim for roughly 128 MB to 512 MB per file for good query-engine performance, and control it with &lt;code&gt;file_max_bytes&lt;/code&gt; under &lt;code&gt;target_options&lt;/code&gt;. Many tiny files hurt read throughput, while a few huge files limit parallelism.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>s3</category>
      <category>parquet</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Extract data from Databases into DuckLake</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 08 Jun 2026 13:22:41 +0000</pubDate>
      <link>https://dev.to/flarco/extract-data-from-databases-into-ducklake-ma6</link>
      <guid>https://dev.to/flarco/extract-data-from-databases-into-ducklake-ma6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the evolving landscape of data engineering, DuckLake is emerging as a powerful solution for building data lakes with ACID transactions, versioning, and a flexible catalog backend. It combines the speed and efficiency of DuckDB with the scalability of cloud storage, making it an attractive choice for modern data platforms.&lt;/p&gt;

&lt;p&gt;A common requirement is to populate a data lake by extracting data from various transactional or analytical databases. This is where &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt; comes in, offering a simple and powerful command-line interface (CLI) to move data between different sources and destinations.&lt;/p&gt;

&lt;p&gt;In this article, we'll walk through how to use Sling to extract data from a PostgreSQL database and load it into DuckLake. The same principles can be applied to other databases that Sling supports, such as MySQL, SQL Server, Oracle, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is DuckLake?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ducklake.select/" rel="noopener noreferrer"&gt;DuckLake&lt;/a&gt; is a data lake format that brings the power of DuckDB to a data lake architecture. It provides a transactional layer over your data files (like Parquet) stored in object storage (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage, or local files). It uses a catalog database (like DuckDB, SQLite, PostgreSQL, or MySQL) to manage metadata, schemas, and versions.&lt;/p&gt;

&lt;p&gt;This setup allows you to query your data lake using standard SQL with the performance benefits of DuckDB, while ensuring data consistency and reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up the Environment
&lt;/h2&gt;

&lt;p&gt;Before we begin, make sure you have &lt;a href="https://docs.slingdata.io/sling-cli/installation" rel="noopener noreferrer"&gt;Sling CLI installed&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring the Connections
&lt;/h3&gt;

&lt;p&gt;We need to configure two connections in Sling: one for our source database (PostgreSQL) and one for our target (DuckLake).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Source Database: PostgreSQL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's set up a connection to a PostgreSQL database. You can do this by setting an environment variable or by using the &lt;code&gt;sling conns&lt;/code&gt; command. See &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;here&lt;/a&gt; for more details.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_CONN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgres://user:pass@host:5432/dbname"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Target: DuckLake&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For DuckLake, we need to specify the catalog type, the connection string for the catalog, and the path where the data will be stored. For this example, we'll use a local DuckDB file as our catalog and a local directory for our data. See &lt;a href="https://docs.slingdata.io/connections/datalake-connections/ducklake" rel="noopener noreferrer"&gt;here&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;Here's how to set up the DuckLake connection using &lt;code&gt;sling conns set&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MY_DUCKLAKE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ducklake &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;catalog_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;duckdb &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;catalog_conn_string&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my_catalog.db &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;data_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./ducklake_data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command creates a DuckLake connection named &lt;code&gt;MY_DUCKLAKE&lt;/code&gt; that uses a local DuckDB file &lt;code&gt;my_catalog.db&lt;/code&gt; for the catalog and stores data in the &lt;code&gt;./ducklake_data&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;You can verify that your connections are set up correctly by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Extracting Data from PostgreSQL to DuckLake
&lt;/h2&gt;

&lt;p&gt;With our connections configured, extracting data is straightforward. We can use a simple &lt;code&gt;sling run&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;Let's say we want to extract the &lt;code&gt;customers&lt;/code&gt; table from the &lt;code&gt;public&lt;/code&gt; schema in our PostgreSQL database and load it into a table named &lt;code&gt;customers&lt;/code&gt; in the &lt;code&gt;main&lt;/code&gt; schema of our DuckLake.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;--src-conn&lt;/span&gt; POSTGRES_CONN &lt;span class="nt"&gt;--src-stream&lt;/span&gt; public.customers &lt;span class="se"&gt;\&lt;/span&gt;
          &lt;span class="nt"&gt;--tgt-conn&lt;/span&gt; MY_DUCKLAKE &lt;span class="nt"&gt;--tgt-object&lt;/span&gt; main.customers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can use a YAML configuration file for more control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# extract.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_CONN&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MY_DUCKLAKE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# load all tables&lt;/span&gt;
  &lt;span class="na"&gt;public.*&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

  &lt;span class="na"&gt;finance.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main.finance_customers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; extract.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! Sling will handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Reading the data from all the tables in the &lt;code&gt;public&lt;/code&gt; schema, and the &lt;code&gt;finance.customers&lt;/code&gt; table.&lt;/li&gt;
&lt;li&gt;  Creating the &lt;code&gt;main.finance_customers&lt;/code&gt; table in DuckLake if it doesn't exist, as well as all respective tables from the source &lt;code&gt;public&lt;/code&gt; schema.&lt;/li&gt;
&lt;li&gt;  Writing the data into Parquet files in the &lt;code&gt;ducklake_data&lt;/code&gt; directory.&lt;/li&gt;
&lt;li&gt;  Updating the DuckLake catalog (&lt;code&gt;my_catalog.db&lt;/code&gt;) with the new table information.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Incremental Loads
&lt;/h2&gt;

&lt;p&gt;One of the powerful features of Sling is its ability to handle incremental loads easily. This is crucial for keeping your data lake up-to-date without having to re-extract all the data every time.&lt;/p&gt;

&lt;p&gt;To perform an incremental load, you need a key column in your source table that indicates the order of records, such as a timestamp or an auto-incrementing ID. Let's assume our &lt;code&gt;customers&lt;/code&gt; table has a &lt;code&gt;updated_at&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;We can use the &lt;code&gt;replication&lt;/code&gt; mode in Sling to manage the state of our incremental loads automatically. Here's how you would structure the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_CONN&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MY_DUCKLAKE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;customer_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;public.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main.customers&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then run this replication with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;incremental&lt;/code&gt; mode will merge new or updated records to the target table and ensure there are no duplicates based on the &lt;code&gt;primary_key&lt;/code&gt;. Sling will automatically track the last &lt;code&gt;updated_at&lt;/code&gt; value it processed and only fetch newer records on subsequent runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;DuckLake offers a compelling solution for building modern, transactional data lakes, and Sling makes it incredibly simple to populate it from any database. With just a few commands, you can perform full extracts or set up robust incremental pipelines to keep your DuckLake synchronized with your source systems.&lt;/p&gt;

&lt;p&gt;To learn more about what you can do with Sling, check out the official &lt;a href="https://docs.slingdata.io" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;. Happy slinging!&lt;/p&gt;

</description>
      <category>duckdb</category>
      <category>ducklake</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Sync PostgreSQL to MotherDuck with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 01 Jun 2026 17:02:53 +0000</pubDate>
      <link>https://dev.to/flarco/sync-postgresql-to-motherduck-with-sling-11oo</link>
      <guid>https://dev.to/flarco/sync-postgresql-to-motherduck-with-sling-11oo</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://motherduck.com/" rel="noopener noreferrer"&gt;MotherDuck&lt;/a&gt; is a serverless analytics service built on DuckDB. It hosts DuckDB databases in the cloud and keeps the same SQL surface you'd use locally. PostgreSQL is what most apps run on for transactional data.&lt;/p&gt;

&lt;p&gt;So you usually want both: Postgres for the app, MotherDuck for analytics. The part in the middle that copies tables across is what &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;This guide replicates a PostgreSQL schema into MotherDuck with Sling, in both full-refresh and incremental modes. The CLI output and row counts below come from an actual run, not a fabricated one.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install method fits your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full installation notes are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the PostgreSQL Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. For PostgreSQL you'll need host, port, database, user, and password.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;sling conns set&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;PG_SOURCE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myuser &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;PG_SOURCE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myuser&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
    &lt;span class="na"&gt;sslmode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;PG_SOURCE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;PostgreSQL connection docs&lt;/a&gt; cover SSL, IAM auth, and other options.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the MotherDuck Target
&lt;/h1&gt;

&lt;p&gt;A MotherDuck connection needs the database name and a service token. You can generate a token from the &lt;a href="https://motherduck.com/docs/authenticating-to-motherduck#authentication-using-a-service-token" rel="noopener noreferrer"&gt;MotherDuck UI&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MOTHERDUCK &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;motherduck &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my_db &lt;span class="nv"&gt;motherduck_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eyJhbGciOi...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or the URL form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MOTHERDUCK &lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"motherduck://my_db?motherduck_token=eyJhbGciOi..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;MOTHERDUCK&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;motherduck&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my_db&lt;/span&gt;
    &lt;span class="na"&gt;motherduck_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eyJhbGciOi...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;MOTHERDUCK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full options (attach modes, copy method, DuckDB version pinning) are in the &lt;a href="https://docs.slingdata.io/connections/database-connections/motherduck" rel="noopener noreferrer"&gt;MotherDuck connection docs&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the PostgreSQL source has three tables in a &lt;code&gt;demo_pg_motherduck&lt;/code&gt; schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; — 5,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 30,000 rows, with an &lt;code&gt;updated_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file lives next to wherever you want to run Sling from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PG_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MOTHERDUCK&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_pg_motherduck.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_pg_motherduck.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;customer_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to point out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object: demo_pg_motherduck.{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name into the target object, so you don't repeat yourself per stream.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;primary_key&lt;/code&gt; and &lt;code&gt;update_key&lt;/code&gt; are set even though the mode here is &lt;code&gt;full-refresh&lt;/code&gt;. The next section flips to incremental without touching those declarations; only the mode changes.&lt;/li&gt;
&lt;li&gt;The target schema gets created automatically by Sling on the first run. No manual &lt;code&gt;CREATE SCHEMA&lt;/code&gt; needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed for readability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [3 streams] | PG_SOURCE -&amp;gt; MOTHERDUCK

INF [1 / 3] running stream demo_pg_motherduck.customers
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table "demo_pg_motherduck"."customers"
INF inserted 5000 rows into "demo_pg_motherduck"."customers" in 11 secs [425 r/s] [390 kB]
INF execution succeeded

INF [2 / 3] running stream demo_pg_motherduck.orders
INF created table "demo_pg_motherduck"."orders"
INF inserted 30000 rows into "demo_pg_motherduck"."orders" in 14 secs [2,131 r/s] [2.6 MB]
INF execution succeeded

INF [3 / 3] running stream demo_pg_motherduck.events
INF created table "demo_pg_motherduck"."events"
INF inserted 60000 rows into "demo_pg_motherduck"."events" in 9 secs [6,036 r/s] [3.3 MB]
INF execution succeeded

INF Sling Replication Completed in 40s | PG_SOURCE -&amp;gt; MOTHERDUCK | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,000 rows across three tables, end to end, in 40 seconds. The &lt;code&gt;_tmp&lt;/code&gt; tables that show up in the full log are Sling's staging step before it swaps the data into the final target. They get cleaned up automatically.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;A &lt;code&gt;count(*)&lt;/code&gt; from MotherDuck right after the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'customers'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------+-------+
| T         |     C |
+-----------+-------+
| customers |  5000 |
| orders    | 30000 |
| events    | 60000 |
+-----------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small sample to confirm the data made the trip with types intact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+-------------+------------+--------+-------------------------------+
| EVENT_ID | CUSTOMER_ID | EVENT_TYPE | REGION | OCCURRED_AT                   |
+----------+-------------+------------+--------+-------------------------------+
|        1 |           2 | click      | us-2   | 2025-01-01 00:00:01 +0000 UTC |
|        2 |           3 | signup     | us-3   | 2025-01-01 00:00:02 +0000 UTC |
|        3 |           4 | purchase   | us-4   | 2025-01-01 00:00:03 +0000 UTC |
|        4 |           5 | page_view  | us-5   | 2025-01-01 00:00:04 +0000 UTC |
|        5 |           6 | click      | us-6   | 2025-01-01 00:00:05 +0000 UTC |
+----------+-------------+------------+--------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Numeric, varchar, and timestamp columns round-tripped cleanly. Nullable columns (&lt;code&gt;region&lt;/code&gt; is null on every seventh row in the source) are preserved as nulls, not as the string &lt;code&gt;"NULL"&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Switching to Incremental
&lt;/h1&gt;

&lt;p&gt;Full-refreshing a 60,000-row table every day is fine. Full-refreshing a 600-million-row event table every day is not. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; reads only the rows newer than the highest &lt;code&gt;update_key&lt;/code&gt; already in the target.&lt;/p&gt;

&lt;p&gt;Drop &lt;code&gt;customers&lt;/code&gt; from the streams (it changes slowly enough to keep on full-refresh in a separate run, or rebuild weekly) and switch the mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PG_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MOTHERDUCK&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_pg_motherduck.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_pg_motherduck.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_pg_motherduck.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert 1,000 new orders and 2,500 new events on the source (this simulates a day's worth of data flowing in), then run again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [2 streams] | PG_SOURCE -&amp;gt; MOTHERDUCK

INF [1 / 2] running stream demo_pg_motherduck.orders
INF getting checkpoint value (updated_at)
INF writing to target database [mode: incremental]
INF inserted 1000 rows into "demo_pg_motherduck"."orders" in 9 secs [104 r/s] [86 kB]
INF execution succeeded

INF [2 / 2] running stream demo_pg_motherduck.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF inserted 2500 rows into "demo_pg_motherduck"."events" in 6 secs [358 r/s] [137 kB]
INF execution succeeded

INF Sling Replication Completed in 20s | PG_SOURCE -&amp;gt; MOTHERDUCK | 2 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;getting checkpoint value&lt;/code&gt; line is where Sling looks at the target, finds the largest &lt;code&gt;updated_at&lt;/code&gt; already present, and uses that as the lower bound on the source query. Only the new rows come across:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_pg_motherduck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      |     C |
+--------+-------+
| orders | 31000 |
| events | 62500 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Orders went from 30,000 to 31,000. Events went from 60,000 to 62,500. Matches what was inserted on the source.&lt;/p&gt;

&lt;p&gt;If you need updates as well as inserts (a row's &lt;code&gt;updated_at&lt;/code&gt; changes and the existing row should be replaced rather than duplicated), keep &lt;code&gt;mode: incremental&lt;/code&gt; and make sure &lt;code&gt;primary_key&lt;/code&gt; is set. Sling will upsert against the primary key instead of appending. The &lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;replication modes docs&lt;/a&gt; cover the trade-offs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common Tweaks
&lt;/h1&gt;

&lt;p&gt;A few options you'll reach for once the basics are in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema and column casing.&lt;/strong&gt; MotherDuck (DuckDB) is case-sensitive, and Sling defaults to keeping the source casing. Add &lt;code&gt;target_options: { column_casing: snake }&lt;/code&gt; under &lt;code&gt;defaults&lt;/code&gt; if your Postgres source has mixed-case identifiers and you want a clean snake_case target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add new columns automatically.&lt;/strong&gt; When the source schema changes, set &lt;code&gt;target_options: { add_new_columns: true }&lt;/code&gt; so Sling alters the MotherDuck table on the next run. Without it, new source columns get dropped at the boundary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick a copy method.&lt;/strong&gt; The default for MotherDuck is &lt;code&gt;csv_http&lt;/code&gt;. For very wide rows or large text values, switch to &lt;code&gt;arrow_http&lt;/code&gt; via &lt;code&gt;copy_method: arrow_http&lt;/code&gt; in the connection config. It's usually faster and avoids CSV escaping edge cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a custom &lt;code&gt;sql:&lt;/code&gt; block in a stream to project columns or filter rows before they leave Postgres. Cheaper than dragging unused columns to MotherDuck.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to Go Next
&lt;/h1&gt;

&lt;p&gt;The same replication pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into MotherDuck: MySQL, SQL Server, Snowflake, BigQuery, and the rest. Swap the source connection and leave the target alone.&lt;/p&gt;

&lt;p&gt;If you'd rather store flat files than warehouse tables, see &lt;a href="https://slingdata.io/articles/postgres-to-s3-parquet-with-sling/" rel="noopener noreferrer"&gt;PostgreSQL to S3 as Parquet&lt;/a&gt;, which uses the same replication file shape with a file-system target. For a local DuckDB setup instead of a managed MotherDuck one, see &lt;a href="https://slingdata.io/articles/postgres-to-duckdb-with-sling/" rel="noopener noreferrer"&gt;PostgreSQL to DuckDB&lt;/a&gt;. For team workflows with scheduling and alerting on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>duckdb</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Replicate MySQL to ClickHouse with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Tue, 26 May 2026 13:29:56 +0000</pubDate>
      <link>https://dev.to/flarco/replicate-mysql-to-clickhouse-with-sling-3ghj</link>
      <guid>https://dev.to/flarco/replicate-mysql-to-clickhouse-with-sling-3ghj</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://clickhouse.com/" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; is a columnar OLAP database. It runs aggregate queries across billions of rows in seconds. MySQL is what most apps run on for transactional reads and writes. Different jobs, different storage shapes, which is why people end up running them side by side: MySQL for the app, ClickHouse for analytics on top of the app's data.&lt;/p&gt;

&lt;p&gt;The piece in the middle, the bit that copies tables from MySQL into ClickHouse and keeps them current, is what &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;This guide replicates a MySQL schema into ClickHouse with Sling, in both full-refresh and incremental modes. The CLI output, row counts, and timings below all come from an actual run against a Docker MySQL on the source side and a self-hosted ClickHouse 25.4 on the target side. The same configuration works against &lt;a href="https://clickhouse.com/cloud" rel="noopener noreferrer"&gt;ClickHouse Cloud&lt;/a&gt;; only the connection URL changes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install method fits your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installation notes for every platform are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the MySQL Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. For MySQL you need host, port, database, user, and password.&lt;/p&gt;

&lt;p&gt;A read-only Sling user is the right shape for replication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="s1"&gt;'sling'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'%'&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;source_schema&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="s1"&gt;'sling'&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;sling conns set&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;MYSQL_SOURCE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mysql &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sling &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3306
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;MYSQL_SOURCE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sling&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your MySQL requires TLS, append &lt;code&gt;?tls=skip-verify&lt;/code&gt; to the URL form, or set &lt;code&gt;tls: skip-verify&lt;/code&gt; in the YAML. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;MYSQL_SOURCE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/mysql" rel="noopener noreferrer"&gt;MySQL connection docs&lt;/a&gt; cover SSL, IAM auth, and the rest of the options.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the ClickHouse Target
&lt;/h1&gt;

&lt;p&gt;ClickHouse speaks two protocols: native (port 9000) and HTTP (port 8123 / 8443 with TLS). Sling supports both. For self-hosted clusters the native protocol is usually the fastest path; for ClickHouse Cloud, the HTTPS endpoint is the supported one.&lt;/p&gt;

&lt;p&gt;Self-hosted, native protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;CLICKHOUSE &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;clickhouse &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9000 &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClickHouse Cloud over HTTPS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;CLICKHOUSE &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://default:mypass@xxxxxx.us-east-1.aws.clickhouse.cloud:8443/default"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CLICKHOUSE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;CLICKHOUSE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/clickhouse" rel="noopener noreferrer"&gt;ClickHouse connection docs&lt;/a&gt; list every option, including the HTTP URL form and the &lt;code&gt;export_stream_format&lt;/code&gt; setting for tuning the staging file format.&lt;/p&gt;

&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the MySQL source has three tables in a &lt;code&gt;demo_mysql_clickhouse&lt;/code&gt; database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; — 5,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 30,000 rows, with an &lt;code&gt;updated_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file lives next to wherever you run Sling from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MYSQL_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLICKHOUSE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_mysql_clickhouse.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_mysql_clickhouse.customers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;customer_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth pointing out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object: demo_mysql_clickhouse.{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name into the target object, so you don't repeat yourself per stream.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;primary_key&lt;/code&gt; and &lt;code&gt;update_key&lt;/code&gt; are set even though the mode is &lt;code&gt;full-refresh&lt;/code&gt;. The next section flips to incremental without touching those declarations; only the mode changes.&lt;/li&gt;
&lt;li&gt;The target database (&lt;code&gt;demo_mysql_clickhouse&lt;/code&gt; on ClickHouse) gets created automatically by Sling on the first run. No manual &lt;code&gt;CREATE DATABASE&lt;/code&gt; needed on the target side.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed for readability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [3 streams] | MYSQL_SOURCE -&amp;gt; CLICKHOUSE

INF [1 / 3] running stream demo_mysql_clickhouse.customers
INF reading from source database
INF writing to target database [mode: full-refresh]
INF created table `demo_mysql_clickhouse`.`customers`
INF inserted 5000 rows into `demo_mysql_clickhouse`.`customers` in 0 secs [8,853 r/s] [396 kB]
INF execution succeeded

INF [2 / 3] running stream demo_mysql_clickhouse.orders
INF created table `demo_mysql_clickhouse`.`orders`
INF inserted 30000 rows into `demo_mysql_clickhouse`.`orders` in 1 secs [29,381 r/s] [2.8 MB]
INF execution succeeded

INF [3 / 3] running stream demo_mysql_clickhouse.events
INF created table `demo_mysql_clickhouse`.`events`
INF inserted 60000 rows into `demo_mysql_clickhouse`.`events` in 0 secs [81,559 r/s] [3.2 MB]
INF execution succeeded

INF Sling Replication Completed in 4s | MYSQL_SOURCE -&amp;gt; CLICKHOUSE | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,000 rows across three tables, end to end, in 4 seconds. The &lt;code&gt;_tmp&lt;/code&gt; tables that show up in the full log are Sling's staging step before it swaps the data into the final target. They get cleaned up automatically.&lt;/p&gt;

&lt;p&gt;When Sling creates the table, it asks for &lt;code&gt;MergeTree&lt;/code&gt; with the primary key columns as the sorting key. That's a fine baseline for analytical queries. The "Common Tweaks" section below covers how to override it when you need partitioning, replication, or a different engine.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;A &lt;code&gt;count()&lt;/code&gt; from ClickHouse right after the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'customers'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------+-------+
| T         | C     |
+-----------+-------+
| customers |  5000 |
| orders    | 30000 |
| events    | 60000 |
+-----------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small sample to confirm the data made the trip with types intact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+-------------+------------+--------+-------------------------------+
| EVENT_ID | CUSTOMER_ID | EVENT_TYPE | REGION | OCCURRED_AT                   |
+----------+-------------+------------+--------+-------------------------------+
|        1 |           2 | signup     | us-2   | 2025-01-01 00:00:01 +0000 UTC |
|        2 |           3 | purchase   | us-3   | 2025-01-01 00:00:02 +0000 UTC |
|        3 |           4 | logout     | us-4   | 2025-01-01 00:00:03 +0000 UTC |
|        4 |           5 | page_view  | us-1   | 2025-01-01 00:00:04 +0000 UTC |
|        5 |           6 | click      | us-2   | 2025-01-01 00:00:05 +0000 UTC |
+----------+-------------+------------+--------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Numeric, varchar, and timestamp columns round-tripped cleanly. The nullable &lt;code&gt;region&lt;/code&gt; column (every seventh row in the source is null) lands as ClickHouse &lt;code&gt;Nullable(String)&lt;/code&gt; and preserves nulls as nulls, not as the literal string &lt;code&gt;"NULL"&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Switching to Incremental
&lt;/h1&gt;

&lt;p&gt;Full-refreshing a 60,000-row event table every day is fine. Full-refreshing a 600-million-row event table every day is not. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; reads only the rows newer than the highest &lt;code&gt;update_key&lt;/code&gt; already in the target.&lt;/p&gt;

&lt;p&gt;Drop &lt;code&gt;customers&lt;/code&gt; from the streams (it changes slowly enough to keep on full-refresh in a separate run, or rebuild weekly) and switch the mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MYSQL_SOURCE&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLICKHOUSE&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_mysql_clickhouse.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_mysql_clickhouse.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;order_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;

  &lt;span class="na"&gt;demo_mysql_clickhouse.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Insert 1,000 new orders and 2,500 new events on the source (a stand-in for a day of fresh data), then run again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication [2 streams] | MYSQL_SOURCE -&amp;gt; CLICKHOUSE

INF [1 / 2] running stream demo_mysql_clickhouse.orders
INF getting checkpoint value (updated_at)
INF writing to target database [mode: incremental]
INF inserted 1000 rows into `demo_mysql_clickhouse`.`orders` in 0 secs [1,926 r/s] [93 kB]
INF execution succeeded

INF [2 / 2] running stream demo_mysql_clickhouse.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF inserted 2500 rows into `demo_mysql_clickhouse`.`events` in 0 secs [4,040 r/s] [134 kB]
INF execution succeeded

INF Sling Replication Completed in 2s | MYSQL_SOURCE -&amp;gt; CLICKHOUSE | 2 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;getting checkpoint value&lt;/code&gt; line is where Sling looks at the target, finds the largest &lt;code&gt;updated_at&lt;/code&gt; already present, and uses that as the lower bound on the source query. Only the new rows come across:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;demo_mysql_clickhouse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      | C     |
+--------+-------+
| orders | 31000 |
| events | 62500 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Orders went from 30,000 to 31,000. Events went from 60,000 to 62,500. Matches what was inserted on the source.&lt;/p&gt;

&lt;p&gt;ClickHouse's &lt;code&gt;MergeTree&lt;/code&gt; family is append-friendly. In incremental mode Sling inserts the new rows directly into the main table without rewriting partitions. If you also need updates (a row's &lt;code&gt;updated_at&lt;/code&gt; changes and you want the existing target row replaced rather than duplicated), keep &lt;code&gt;mode: incremental&lt;/code&gt; and make sure &lt;code&gt;primary_key&lt;/code&gt; is set. Sling will use a &lt;code&gt;ReplacingMergeTree&lt;/code&gt;-style upsert path against that key. The &lt;a href="https://docs.slingdata.io/concepts/replication/modes" rel="noopener noreferrer"&gt;replication modes docs&lt;/a&gt; cover the trade-offs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common Tweaks
&lt;/h1&gt;

&lt;p&gt;A few options worth reaching for once the basics are in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pick a table engine.&lt;/strong&gt; ClickHouse's default &lt;code&gt;MergeTree&lt;/code&gt; is a fine baseline, but for high-write or replicated clusters you'll want &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;, partitioning by month, and a TTL. Set &lt;code&gt;target_options.table_ddl&lt;/code&gt; per stream with the full &lt;code&gt;CREATE TABLE&lt;/code&gt; you want; Sling will use it instead of generating its own. Example: &lt;code&gt;engine = MergeTree() ORDER BY (customer_id, occurred_at) PARTITION BY toYYYYMM(occurred_at)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add new columns automatically.&lt;/strong&gt; When the source schema changes, set &lt;code&gt;target_options: { add_new_columns: true }&lt;/code&gt; so Sling alters the ClickHouse table on the next run. Without it, new source columns get dropped at the boundary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune the staging format.&lt;/strong&gt; Sling stages data as a file before bulk-loading into ClickHouse. The default is &lt;code&gt;CSVWithNames&lt;/code&gt;, which is robust but verbose. For wide rows or large text values, set &lt;code&gt;export_stream_format: Parquet&lt;/code&gt; on the ClickHouse connection. Usually faster and more compact on the wire.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a custom &lt;code&gt;sql:&lt;/code&gt; block in a stream to project columns or filter rows before they leave MySQL. Cheaper than dragging unused columns to ClickHouse, and it keeps row payloads small for the network hop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to Go Next
&lt;/h1&gt;

&lt;p&gt;The same replication pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into ClickHouse: PostgreSQL, SQL Server, Snowflake, BigQuery, and the rest. Swap the source connection and leave the target alone. For the equivalent flow from a Postgres source, see &lt;a href="https://slingdata.io/articles/postgres-to-clickhouse-sling/" rel="noopener noreferrer"&gt;PostgreSQL to ClickHouse&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If your downstream is more cloud-warehouse than columnar engine, &lt;a href="https://slingdata.io/articles/motherduck-from-mysql-sling/" rel="noopener noreferrer"&gt;MySQL to MotherDuck&lt;/a&gt; covers the same setup with DuckDB-on-the-cloud as the target. For team workflows with scheduling and alerting on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>clickhouse</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
    <item>
      <title>Load PostgreSQL into Apache Iceberg with Sling</title>
      <dc:creator>Fritz Larco</dc:creator>
      <pubDate>Mon, 18 May 2026 13:31:11 +0000</pubDate>
      <link>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</link>
      <guid>https://dev.to/flarco/load-postgresql-into-apache-iceberg-with-sling-1cm8</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; is the table format that turns a pile of Parquet files in object storage into something that behaves like a warehouse table. You get schema evolution, hidden partitioning, time travel, and consistent reads from whichever engine you point at the table. PostgreSQL is where most operational data starts. Moving it into Iceberg gives you an analytics copy that DuckDB, Spark, Trino, Snowflake, and Athena can all read without anyone needing to agree on a single warehouse vendor first.&lt;/p&gt;

&lt;p&gt;Sling speaks the Iceberg &lt;a href="https://iceberg.apache.org/spec/#rest-catalog-spec" rel="noopener noreferrer"&gt;REST catalog&lt;/a&gt; directly. From the configuration side an Iceberg target is just another database connection: point Sling at the catalog URL and the underlying object store, then declare your streams. No JVM, no Spark, no manual manifest writing.&lt;/p&gt;

&lt;p&gt;This guide replicates a Postgres schema into Iceberg using &lt;a href="https://slingdata.io" rel="noopener noreferrer"&gt;Sling&lt;/a&gt;. The catalog is &lt;a href="https://developers.cloudflare.com/r2/data-catalog/" rel="noopener noreferrer"&gt;Cloudflare R2's managed Iceberg REST catalog&lt;/a&gt; and the storage layer underneath is R2. Every CLI line, row count, and timing below comes from an actual run against those endpoints.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing Sling
&lt;/h1&gt;

&lt;p&gt;Sling is a single binary. Pick whichever install fits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://slingdata.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
irm https://slingdata.io/install.ps1 | iex

&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full install notes are in the &lt;a href="https://docs.slingdata.io/sling-cli/getting-started" rel="noopener noreferrer"&gt;Sling CLI Getting Started Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Postgres Source
&lt;/h1&gt;

&lt;p&gt;Sling reads connection details from &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;, environment variables, or &lt;code&gt;sling conns set&lt;/code&gt;. A read-only user is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;USER&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;password&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;mydb&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;sling&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then register the connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;set &lt;/span&gt;POSTGRES &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nv"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host.ip &lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sling &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydb &lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypass &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in &lt;code&gt;~/.sling/env.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;POSTGRES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host.ip&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sling&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mypass&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mydb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Postgres requires SSL, append &lt;code&gt;sslmode: require&lt;/code&gt;. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;POSTGRES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.slingdata.io/connections/database-connections/postgres" rel="noopener noreferrer"&gt;Postgres connection docs&lt;/a&gt; cover SSL, IAM, and the rest.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the Iceberg Target
&lt;/h1&gt;

&lt;p&gt;Sling treats Iceberg as a database-class target. The connection captures two things: the catalog, which stores table metadata, and the warehouse, which stores the actual Parquet data files. Sling supports REST, AWS Glue, and SQL catalogs. This guide uses REST.&lt;/p&gt;

&lt;p&gt;For Cloudflare R2's Iceberg catalog you need the catalog URL, an API token, the warehouse identifier (account-id + bucket name), and S3-compatible credentials for the R2 bucket underneath. All four come from the R2 dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;connections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ICEBERG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iceberg&lt;/span&gt;
    &lt;span class="na"&gt;catalog_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rest&lt;/span&gt;
    &lt;span class="na"&gt;rest_uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://catalog.cloudflarestorage.com/&amp;lt;accountid&amp;gt;/&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_catalog_api_token&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;rest_warehouse&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;accountid&amp;gt;_&amp;lt;bucket&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_access_key_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_access_key_id&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;s3_secret_access_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;r2_secret_access_key&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a self-hosted &lt;a href="https://github.com/lakekeeper/lakekeeper" rel="noopener noreferrer"&gt;Lakekeeper&lt;/a&gt; or &lt;a href="https://projectnessie.org/" rel="noopener noreferrer"&gt;Nessie&lt;/a&gt; catalog, the shape is the same; only the &lt;code&gt;rest_uri&lt;/code&gt; and &lt;code&gt;rest_warehouse&lt;/code&gt; change. For AWS Glue, set &lt;code&gt;catalog_type: glue&lt;/code&gt; and &lt;code&gt;glue_warehouse: s3://my-bucket/warehouse&lt;/code&gt;. The &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connection docs&lt;/a&gt; walk through each catalog type.&lt;/p&gt;

&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;test &lt;/span&gt;ICEBERG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  A Full-Refresh Replication
&lt;/h1&gt;

&lt;p&gt;For this run the Postgres source has three tables in a &lt;code&gt;demo_postgres_iceberg&lt;/code&gt; schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;users&lt;/code&gt; — 8,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;orders&lt;/code&gt; — 35,000 rows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt; — 60,000 rows, with an &lt;code&gt;occurred_at&lt;/code&gt; timestamp&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The replication file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full-refresh&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;primary_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;event_id&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;object:&lt;/code&gt; follows the usual &lt;code&gt;&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt; shape. Sling creates the Iceberg namespace if it doesn't already exist in the catalog.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;{stream_table}&lt;/code&gt; is a &lt;a href="https://docs.slingdata.io/concepts/replication/runtime-variables" rel="noopener noreferrer"&gt;runtime variable&lt;/a&gt;. Sling substitutes the source table name so you don't repeat yourself.&lt;/li&gt;
&lt;li&gt;The third stream switches to &lt;code&gt;mode: incremental&lt;/code&gt; with an &lt;code&gt;update_key&lt;/code&gt;. That's the only diff between a one-shot bulk load and an ongoing append flow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real output, trimmed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling CLI | https://slingdata.io
WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
INF Sling Replication [3 streams] | POSTGRES -&amp;gt; ICEBERG

INF [1 / 3] running stream demo_postgres_iceberg.users
INF created table "demo_postgres_iceberg"."users"
INF streaming data (direct insert)
INF inserted 8000 rows into "demo_postgres_iceberg"."users" in 11 secs [713 r/s] [519 kB]

INF [2 / 3] running stream demo_postgres_iceberg.orders
INF created table "demo_postgres_iceberg"."orders"
INF inserted 35000 rows into "demo_postgres_iceberg"."orders" in 9 secs [3,721 r/s] [2.1 MB]

INF [3 / 3] running stream demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF writing to target database [mode: incremental]
INF created table "demo_postgres_iceberg"."events"
INF inserted 60000 rows into "demo_postgres_iceberg"."events" in 7 secs [8,190 r/s] [4.5 MB]

INF Sling Replication Completed in 29s | POSTGRES -&amp;gt; ICEBERG | 3 Successes | 0 Failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;103,000 rows across three tables, 29 seconds end-to-end. The warning at the top deserves a real answer; see the section on incremental modes further down.&lt;/p&gt;

&lt;h1&gt;
  
  
  Verification
&lt;/h1&gt;

&lt;p&gt;Sling can query Iceberg tables directly through its DuckDB-backed reader. Tables are addressed as &lt;code&gt;iceberg_catalog.&amp;lt;namespace&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select 'users' as t, count(*) as c
     from iceberg_catalog.demo_postgres_iceberg.users
   union all
   select 'orders', count(*) from iceberg_catalog.demo_postgres_iceberg.orders
   union all
   select 'events', count(*) from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------+-------+
| T      |     C |
+--------+-------+
| users  |  8000 |
| orders | 35000 |
| events | 60000 |
+--------+-------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Row counts match the source. A sample of &lt;code&gt;users&lt;/code&gt; confirms columns and types survived the trip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select user_id, email, country, signup_at
     from iceberg_catalog.demo_postgres_iceberg.users
    order by user_id limit 5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------+-------------------+---------+-------------------------------+
| USER_ID | EMAIL             | COUNTRY | SIGNUP_AT                     |
+---------+-------------------+---------+-------------------------------+
|       1 | user1@example.com | BR      | 2025-01-01 00:14:00 -0300 -03 |
|       2 | user2@example.com | DE      | 2025-01-01 00:28:00 -0300 -03 |
|       3 | user3@example.com | FR      | 2025-01-01 00:42:00 -0300 -03 |
|       4 | user4@example.com | JP      | 2025-01-01 00:56:00 -0300 -03 |
|       5 | user5@example.com | UK      | 2025-01-01 01:10:00 -0300 -03 |
+---------+-------------------+---------+-------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Postgres &lt;code&gt;jsonb&lt;/code&gt; lands as a structured column too. Sampling &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------+---------+------------+----------------------+----------------------+
| EVENT_ID | USER_ID | EVENT_TYPE | PAYLOAD              | OCCURRED_AT          |
+----------+---------+------------+----------------------+----------------------+
|    60001 |       2 | click      | {"v": 1, "utm": "x"} | 2026-05-11 ...       |
|    60002 |       3 | signup     | {"v": 2, "utm": "x"} | 2026-05-11 ...       |
|    60003 |       4 | purchase   | {"v": 3, "utm": "x"} | 2026-05-11 ...       |
+----------+---------+------------+----------------------+----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any other Iceberg reader sees the same data: DuckDB with the &lt;code&gt;iceberg&lt;/code&gt; extension, Spark, Trino, Athena, Snowflake's catalog-linked databases. That portability is the reason for the catalog in the first place.&lt;/p&gt;

&lt;h1&gt;
  
  
  Running an Incremental Append
&lt;/h1&gt;

&lt;p&gt;After the bulk load, the day-to-day shape is: every few minutes (or hours, or once a day), pick up the new rows since the last run and append them to the Iceberg table. Sling's &lt;a href="https://docs.slingdata.io/concepts/replication/modes#incremental-mode" rel="noopener noreferrer"&gt;incremental mode&lt;/a&gt; does this. The state (the last seen value of the &lt;code&gt;update_key&lt;/code&gt;) is tracked by Sling itself, so you don't need to manage a state file the way you would for a file-based target.&lt;/p&gt;

&lt;p&gt;Insert 2,500 new events on the source (a stand-in for fresh activity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_postgres_iceberg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'click'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'utm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'v'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 second'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run a single-stream replication that touches only &lt;code&gt;events&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replication-incremental.yaml&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POSTGRES&lt;/span&gt;
&lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ICEBERG&lt;/span&gt;

&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;object&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo_postgres_iceberg.{stream_table}&lt;/span&gt;

&lt;span class="na"&gt;streams&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;demo_postgres_iceberg.events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;incremental&lt;/span&gt;
    &lt;span class="na"&gt;update_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;occurred_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling run &lt;span class="nt"&gt;-r&lt;/span&gt; replication-incremental.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INF Sling Replication | POSTGRES -&amp;gt; ICEBERG | demo_postgres_iceberg.events
INF getting checkpoint value (occurred_at)
INF reading from source database
INF writing to target database [mode: incremental]
INF streaming data (direct insert)
INF inserted 2500 rows into "demo_postgres_iceberg"."events" in 8 secs [294 r/s] [178 kB]
INF execution succeeded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sling read the saved checkpoint, pulled only rows newer than the last &lt;code&gt;occurred_at&lt;/code&gt; it saw, and appended exactly the 2,500 new rows. A readback confirms the new total:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sling conns &lt;span class="nb"&gt;exec &lt;/span&gt;ICEBERG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select min(occurred_at), max(occurred_at), count(*)
     from iceberg_catalog.demo_postgres_iceberg.events"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-------------------------------+--------------------------------------+--------+
| MIN_OCCURRED_AT               | MAX_OCCURRED_AT                      | COUNT  |
+-------------------------------+--------------------------------------+--------+
| 2025-03-01 00:00:40 -0300 -03 | 2026-05-11 08:42:59.533692 -0300 -03 |  62500 |
+-------------------------------+--------------------------------------+--------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;60,000 + 2,500 = 62,500. The new high-water mark on &lt;code&gt;occurred_at&lt;/code&gt; is the timestamp of the freshest insert. The next scheduled run will start from there.&lt;/p&gt;

&lt;h1&gt;
  
  
  Append-incremental vs merge-incremental
&lt;/h1&gt;

&lt;p&gt;That warning Sling printed on the first run matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WRN for mode 'incremental' with iceberg target, primary-key is ineffective,
    incremental merge is not yet supported (only appends)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For database targets like Postgres or Snowflake, Sling's &lt;code&gt;incremental&lt;/code&gt; mode is a merge: a row whose &lt;code&gt;primary_key&lt;/code&gt; already exists in the target gets updated in place. For an Iceberg target today, &lt;code&gt;incremental&lt;/code&gt; means append only. New rows go in, existing rows stay as-is, and a &lt;code&gt;primary_key&lt;/code&gt; declared on the stream is parsed but not enforced.&lt;/p&gt;

&lt;p&gt;That is fine when your source is append-only: events, immutable transactions, log data. It is the wrong default if your source has mutable rows you need reflected on the lake side. Until merge lands, two patterns work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Snapshot replays. Run &lt;code&gt;mode: full-refresh&lt;/code&gt; on a cadence that matches your freshness budget. Iceberg's snapshot model means readers always see a consistent table; the old snapshot is replaced atomically. For tables in the low millions this is faster than it sounds.&lt;/li&gt;
&lt;li&gt;CDC-style append plus downstream resolution. Append every Postgres change to Iceberg as-is (using a logical-replication tool or trigger-based capture) and resolve the latest-state view at read time with something like &lt;code&gt;qualify row_number() over (partition by pk order by event_ts desc) = 1&lt;/code&gt;. A bit more work at query time, very cheap at write time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Track the &lt;a href="https://docs.slingdata.io/connections/database-connections/iceberg" rel="noopener noreferrer"&gt;Iceberg connector docs&lt;/a&gt; for when full merge mode ships.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common tweaks
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose the right catalog.&lt;/strong&gt; REST is the most portable: the same connection shape works for Cloudflare R2, Lakekeeper, Nessie, Polaris, and any other REST-compatible catalog. Glue is the simplest in AWS-native shops. SQL catalog is fine for local dev. Avoid wiring a different catalog per environment if you can help it; the table layout doesn't care, but the metadata location does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Namespace organization.&lt;/strong&gt; Treat namespaces (&lt;code&gt;demo_postgres_iceberg.users&lt;/code&gt;) the way you treat warehouse schemas: one per source system, or one per data domain. Don't dump everything into &lt;code&gt;default&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter at the source.&lt;/strong&gt; Use a &lt;code&gt;sql:&lt;/code&gt; block per stream to project columns or filter rows before they leave Postgres. Smaller Parquet files, smaller manifests, cheaper queries downstream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time travel for free.&lt;/strong&gt; Every replication produces a new Iceberg snapshot. Readers can time-travel to a previous snapshot, which is useful for "what did this table look like before yesterday's run?" without storing your own backups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain the table.&lt;/strong&gt; Like any Iceberg table, periodic compaction and snapshot expiration keep the file count and metadata size from growing without bound. Set this up on a separate schedule from the replication itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Where to go next
&lt;/h1&gt;

&lt;p&gt;The same pattern works for any of &lt;a href="https://docs.slingdata.io/connections/database-connections" rel="noopener noreferrer"&gt;Sling's 30+ database sources&lt;/a&gt; into Iceberg: MySQL, SQL Server, Snowflake, BigQuery, MongoDB, and the rest. Swap the source and leave the target alone.&lt;/p&gt;

&lt;p&gt;If the underlying R2 storage is what brought you here, the &lt;a href="https://slingdata.io/articles/r2-from-postgres-parquet-sling/" rel="noopener noreferrer"&gt;Postgres → R2 as Parquet&lt;/a&gt; walkthrough shows the same source landing as raw Parquet files instead of an Iceberg table, which is useful when downstream readers don't need a catalog. For a deeper comparison of file-format targets, see &lt;a href="https://slingdata.io/articles/postgres-to-s3-parquet-with-sling/" rel="noopener noreferrer"&gt;Postgres → S3 as Parquet&lt;/a&gt; and &lt;a href="https://slingdata.io/articles/postgres-to-duckdb-with-sling/" rel="noopener noreferrer"&gt;Postgres → DuckDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For team workflows with scheduling, alerting, and audit trails on top of the same CLI, look at the &lt;a href="https://docs.slingdata.io/sling-platform/getting-started" rel="noopener noreferrer"&gt;Sling Platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Questions go to &lt;a href="https://discord.gg/q5xtaSNDvp" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://github.com/slingdata-io/sling-cli/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>iceberg</category>
      <category>dataengineering</category>
      <category>etl</category>
    </item>
  </channel>
</rss>
