Sourabh Gupta for Estuary

Posted on Sep 26

From OLTP to OLAP: Streaming Databases into MotherDuck with Estuary

#duckdb #dataengineering #motherduck #database

Introduction

DuckDB is quickly becoming one of the most talked about analytical databases. It is fast, lightweight, and designed to run inside your applications, often described as SQLite for analytics. While it works great on a laptop for local analysis, production workflows need something more scalable.

That is where MotherDuck comes in. MotherDuck takes the power of DuckDB and brings it to the cloud. It adds collaboration features, secure storage, and a serverless model that lets teams use DuckDB at scale without worrying about infrastructure.

In this guide, you will learn how to stream data from an OLTP system into MotherDuck using Estuary Flow. This approach lets you run analytical queries on fresh data without putting extra load on your production database.

🎥Prefer watching instead of reading? Check out the short walkthrough below.

Why DuckDB Is Gaining Traction

DuckDB is an open source analytical database designed with a clear goal: to make complex queries fast and simple without heavy infrastructure. Instead of being a traditional client-server database, DuckDB is embedded. It runs inside the host process, which reduces overhead and makes it easy to integrate directly into applications, notebooks, or scripts.

Several features stand out:

In-process operation: Similar to SQLite, DuckDB runs where your code runs. This avoids network calls and gives you low-latency access to data.
Columnar and vectorized execution: DuckDB is optimized for analytical queries. Its execution model speeds up heavy operations such as aggregations, filtering, and joins on large tables.
Portability and extensibility: It has a very small footprint and no external dependencies. At the same time, extensions support advanced data types and file formats, including Parquet, JSON, and geospatial data.
Seamless file access: DuckDB can query local files directly without requiring an ETL pipeline. For example, you can run SQL queries on CSV or Parquet files straight from disk.
Integration with data science tools: DuckDB connects smoothly with Python, R, and Jupyter notebooks, which makes it a favorite among data scientists. Because of this balance of speed, flexibility, and simplicity, DuckDB is increasingly used as the analytical layer in modern data pipelines, as well as for ad hoc analysis by engineers and analysts.

MotherDuck: DuckDB in the Cloud

DuckDB is excellent for local analysis, but production environments often require more than a local embedded database. Teams need collaboration, security, and scalability. That is where MotherDuck comes in.

MotherDuck is a managed cloud service built on top of DuckDB. It extends the same fast and lightweight query engine into a serverless environment while adding features that make it practical for organizations:

Serverless architecture: No servers to manage and no infrastructure overhead. MotherDuck scales automatically with your workloads.
Collaboration: Share queries, results, and datasets with teammates in real time. This makes it easier for teams to work from the same source of truth.
Secure secret storage: Manage credentials and connections safely in the cloud.
Integration with pipelines: Platforms like Estuary can write directly into MotherDuck, which means your data is always fresh and ready for analysis. In practice, MotherDuck gives teams the best of both worlds: the performance and simplicity of DuckDB combined with the scalability and ease of use of a modern cloud service.

OLTP → OLAP: The Core Use Case

Most production applications run on OLTP databases such as PostgreSQL, MySQL, or MongoDB. These systems are designed for fast inserts, updates, and deletes. They keep applications responsive but are not optimized for running heavy analytical queries.

Running aggregations, joins, or reports directly on an OLTP database can:

Slow down your application performance.
Increase operational risk by adding load to your production environment.
Limit the ability of analysts and data scientists to explore data freely.

This is why organizations separate OLTP (transactional) systems from OLAP (analytical) systems. The OLTP database handles day-to-day transactions, while an OLAP database is dedicated to complex queries and reporting.

DuckDB, and by extension MotherDuck, fits perfectly as an OLAP layer. With Estuary Flow, you can capture real-time changes from your OLTP source and stream them into MotherDuck. This way, analysts always have up-to-date data to query without touching the production database.

Setting Up Estuary with MotherDuck

In this section, we’ll walk through the process of connecting your OLTP source to MotherDuck using Estuary Flow. The setup is straightforward and only takes a few steps.

Step 1: Prepare Your Source in Estuary

Before you can send data to MotherDuck, you need a source system connected in Estuary. A source could be any OLTP database such as PostgreSQL, MySQL, or MongoDB. Estuary also supports SaaS applications, event streams, and file-based sources.

To prepare a source:

Go to the Captures tab in the Estuary dashboard.
Create a new capture and select the connector for your source system.
Provide the connection details (for example, host, port, database name, and credentials).
Save and publish the capture.

Once this is done, Estuary begins ingesting data from your source and continuously tracks new changes. This stream of data is stored in an internal collection, which you will later connect to MotherDuck.

Tip: If you are new to Estuary, try starting with a simple dataset (like PostgreSQL or a CSV file) before moving on to production-scale sources.

Step 2: Create a MotherDuck Materialization

With your source capture running, the next step is to set up MotherDuck as the destination for your data. In Estuary, this is called a materialization.

To create one:

Go to the Destinations tab in the Estuary dashboard.
Click New Materialization.
Search for MotherDuck in the connector catalog and select it.
Give the materialization a descriptive name so you can easily identify it later.

At this point, you will see the configuration screen for the MotherDuck connector. This is where you provide the details that allow Estuary to stage data and deliver it into your MotherDuck database.

In the next step, you’ll configure AWS S3 staging, which Estuary uses as a temporary storage location for data loads.

Step 3: Configure AWS S3 Staging

The MotherDuck connector in Estuary uses an Amazon S3 bucket as a staging area. Data is first written to S3, then loaded into MotherDuck. This design ensures high reliability and scalability for large datasets.

Here’s what you need to set up:

Create or choose an S3 bucket
- Note down the bucket name and its region.
- Optionally, you can define a prefix if you want Estuary to organize staged files under a specific folder.
Set up IAM permissions
- Create or use an IAM user that has read and write access to the S3 bucket.
- Attach a policy with at least the following actions:
  - s3:PutObject
  - s3:GetObject
  - s3:ListBucket
Generate access keys
- In the AWS console, go to the IAM user’s Security Credentials tab.
- Create an access key and secret key.
- Copy these values into the Estuary dashboard when configuring the MotherDuck connector.

At this point, Estuary knows where to stage data and has the permissions needed to write into your S3 bucket.

Tip: For production, avoid using a root account. Always generate access keys from an IAM user with the least privileges necessary.

Step 4: Set Up MotherDuck

Now that AWS S3 staging is ready, it’s time to configure the MotherDuck side of the connection. This step makes sure MotherDuck can pull the staged data into your chosen database.

Generate an access token
- Log in to your MotherDuck account.
- Open the Settings menu and go to Access Tokens.
- Create a new token and copy it into the Estuary connector configuration.
Provide AWS credentials to MotherDuck
- MotherDuck needs permission to read the staged files from your S3 bucket.
- You can provide these credentials either:
a. By running SQL statements inside MotherDuck:
```
 CREATE SECRET aws_access_key '<ACCESS_KEY>';
 CREATE SECRET aws_secret_key '<SECRET_KEY>';
```
b. Or by entering them through the MotherDuck UI.
Choose a target database
- Select an existing database in your MotherDuck account, or create a new one.
- Copy its name into the Estuary configuration.
Decide on delete behavior
- Soft deletes: Mark a record as deleted but keep it in the table for historical analysis.
- Hard deletes: Remove the record entirely.
- Choose the option that best matches your analytics or compliance needs.

Step 5: Publish and Stream Data

Once your MotherDuck materialization is configured, the final step is to publish it and start the data flow.

Select your source data
- Link an entire capture (for example, your PostgreSQL database)
- Or choose specific collections you want to replicate.
Review the configuration
- Double-check that your S3 credentials, MotherDuck token, and database name are correct.
- Make sure you selected the right delete behavior (soft or hard).
Save and publish
- Click Next, then Save & Publish.
- Estuary Flow will immediately begin streaming data from your OLTP source into MotherDuck.

From here, data updates in your source will flow continuously into your MotherDuck database. This gives you a near real-time OLAP environment for analytics, without adding load to your production system.

Step 6: Query in MotherDuck

With the connector published, your data is now flowing into MotherDuck. The final step is to start exploring it.

Open the MotherDuck dashboard and go to Notebooks.
Select the database you configured as the destination.
Run queries using DuckDB’s familiar SQL syntax.

For example, if you replicated an orders table from your OLTP database, you could analyze top customers like this:

SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC
LIMIT 10;

Wrap-Up

By combining Estuary Flow and MotherDuck, you can build a modern pipeline that keeps analytics separate from your production workload without adding extra complexity.

Estuary captures real-time changes from your OLTP databases.
Data is staged in S3 for reliability.
MotherDuck provides a cloud-native DuckDB environment where your team can query and collaborate.

This setup is fast to configure, easy to maintain, and scales with your needs. Instead of managing batch jobs or writing custom scripts, you can focus on analysis and insights.

✅ Key Takeaways

DuckDB is lightweight and powerful for analytics, while MotherDuck brings it to the cloud for collaboration and scalability.
Estuary Flow makes it simple to stream data from OLTP systems into MotherDuck in real time.
AWS S3 is used as a staging layer, requiring IAM permissions and credentials.
Once published, you can query fresh data in MotherDuck notebooks using DuckDB SQL.

👉 Ready to try it yourself? Explore Estuary Flow and see how quickly you can start streaming data into MotherDuck.

DEV Community