Sourabh Gupta for Estuary

Posted on Jul 9

How to Load Data from Amazon S3 to Snowflake in Real Time

#snowflake #datascience

Got a bunch of raw data sitting in Amazon S3 and need to get it into Snowflake for analytics — fast? You’re not alone.

Maybe it’s JSON logs, CSV exports, or event data piling up in your S3 bucket. Maybe you’ve tried batch pipelines or custom scripts but ran into delays, duplicates, or schema chaos. What you actually need is a clean, reliable way to load that S3 data to Snowflake, without spending weeks building and maintaining it.

That’s exactly what Estuary Flow is built for.

Flow makes it easy to build real-time S3 to Snowflake data pipelines with no code, no ops overhead, and no latency headaches. It connects directly to your S3 bucket, picks up new files as they arrive, and keeps your Snowflake warehouse in sync continuously.

In this walkthrough, we’ll show you how to set up an Amazon S3 to Snowflake pipeline using Estuary Flow from start to finish. You’ll go from raw files to live Snowflake tables in just a few steps.

TL;DR: If you're looking to stream data from Amazon S3 to Snowflake, you're in the right place — and Flow makes it a breeze.

Why Stream S3 Data to Snowflake in Real Time?

Let’s be honest — batch processing worked fine back when dashboards only needed to update once a day. But today, teams expect real-time answers: marketing needs up-to-the-minute campaign performance, operations teams need live inventory data, and product managers want to react to user behavior as it happens.

That’s where streaming data from S3 to Snowflake changes the game.

If you’re storing raw files — like logs, events, or exports — in Amazon S3, you’re already halfway there. The missing piece is a low-latency pipeline that gets that data into Snowflake the moment it arrives. No waiting for hourly jobs. No stale reports. Just fresh, query-ready data flowing in 24/7.

Here are a few reasons real-time sync matters:

Analytics that actually keep up – Get real-time insights instead of reading yesterday’s data.
Automation that reacts fast – Trigger workflows in Snowflake based on live S3 updates.
Simplified ops – Eliminate brittle scripts, manual backfills, and sync delays.

Note: Since Amazon S3 doesn’t support native change notifications, Flow polls your bucket every few minutes to detect new files, then streams them to Snowflake immediately. It’s batch under the hood, but real-time in effect.

Why Use Estuary Flow Instead of Traditional ETL Tools?

If you’ve tried to move data from Amazon S3 to Snowflake before, you probably know the drill: patch together an ETL tool, deal with scheduling, wrestle with schema mismatches, and hope the job doesn’t break halfway through.

The thing is, most ETL tools were built for a different era — one where “real time” meant “hourly,” and everything ran in batches. Estuary Flow flips that on its head.

Here’s how Flow makes your S3 to Snowflake pipeline way easier:

Real-Time by Default: Flow isn’t just fast — it’s built for continuous streaming. Once you connect your S3 bucket, Flow automatically picks up new files as they land and streams the data directly into Snowflake.
No Code Required: Set up everything — capture, schema, and materialization — through a clean UI. You don’t need to write Python, wrangle Airflow, or babysit cron jobs.
Schema-Aware + Smart: Flow infers the structure of your S3 data and helps you map it to Snowflake tables. You can tighten up schemas, apply transformations, and evolve structure over time without breaking your pipeline.
Exactly-Once Delivery: No duplicates. No reprocessing. Flow uses cloud-native guarantees to ensure data lands in Snowflake exactly once, even if things get weird.
Built to Scale: Whether you're syncing a few JSON files or streaming terabytes a day, Flow scales automatically without locking you into complex infrastructure.

Estuary Flow takes the friction out of real-time data integration from S3 to Snowflake, so you can focus on using the data, not moving it.

What You Need to Get Started

You don’t need much to build an Amazon S3 to Snowflake pipeline with Estuary Flow — just a few basics:

Estuary Flow Account

Amazon S3 Bucket

This is your data source. You’ll need:

Bucket name & region
Either public access or your AWS access key + secret key
(Optional) A folder path, called a “prefix”

Snowflake Account

Your destination for the data. Make sure you have:

A database, schema, and virtual warehouse
A user with access
Your account URL + login credentials
(Optional) warehouse name and role

That’s it. With these in place, you’re ready to connect the pieces and start streaming.

Step 1: Capture Data from Amazon S3

First up, you’ll connect Estuary Flow to your S3 bucket — this step is called a capture. It’s how Flow knows where to pull your data from.

Here’s how to set it up:

Log into Estuary Flow at dashboard.estuary.dev.
Click the Sources tab and select New Capture.
Choose Amazon S3 from the list of connectors.

You’ll see a form where you enter your S3 details:

Capture name – Something like myorg/s3-orders
AWS credentials – Only needed if your bucket isn’t public
Bucket name & region – From your S3 console
Prefix (optional) – To pull from a specific folder
Match keys (optional) – For filtering files, like *.json

Once you click Next, Flow will connect to your bucket and auto-generate a schema based on your data. You’ll see a preview of your Flow collection — this acts as a live copy of your S3 data inside Flow.

Click Save and Publish to finish the capture.

Behind the scenes, Flow checks your S3 bucket on a 5-minute schedule (by default) to pick up new or updated files. This is how it delivers near-real-time sync, even though S3 itself doesn’t support streaming events.

Next, let’s connect this to Snowflake.

Step 2: Materialize to Snowflake

Now that your data is flowing into Estuary, it’s time to materialize it to Snowflake — in other words, stream it directly into a Snowflake table.

Here’s how to set it up:

After saving your S3 capture, click Materialize Collections.
Choose the Snowflake connector from the destination list.

You’ll fill out a simple form with your Snowflake details:

Materialization name – e.g., myorg/s3-to-snowflake
Account URL – Like myorg-account.snowflakecomputing.com
User + Password – A Snowflake user with the right permissions
Database & Schema – Where the table will live
Warehouse – Optional, but recommended
Role – Optional if already assigned to the user

Once Flow connects, you’ll see your captured collection (from S3) listed.

From here, you can:

Rename the output table
Enable delta updates (if you want changes applied instead of full inserts)
Use Schema Inference to map your flat S3 data into Snowflake’s tabular format

To do that:

Click the Collection tab
Select Schema Inference
Review the suggested schema → Click Apply

Finally, hit Save and Publish.

✅ That’s it — you’ve now got a fully working, real-time S3 to Snowflake pipeline. Flow will continuously sync new files from your bucket straight into your Snowflake warehouse.

What’s Next? Supercharge Your S3 to Snowflake Pipeline

You now have a fully operational, real-time pipeline from Amazon S3 to Snowflake — and it runs continuously, no scripts or schedulers required.

But that’s just the beginning.

With Estuary Flow, you can take things even further:

Add Transformations (a.k.a. Derivations)

Want to clean, filter, or join your data before it lands in Snowflake? Use derivations to apply real-time transformations using SQL or TypeScript, right inside Flow.

You can enrich JSON objects, flatten nested structures, or create entirely new views.

Plug into More Systems

Need to send the same S3 data to BigQuery, Kafka, or a dashboard tool? Just add another materialization — Flow supports multi-destination sync out of the box.

Monitor + Optimize

Use Flow’s built-in observability tools or plug into OpenMetrics to monitor throughput, schema evolution, and pipeline health in real time.

Start Streaming S3 Data to Snowflake Today

The old way — batch jobs, manual scripts, clunky ETL — just can’t keep up with today’s speed of data.

With Estuary Flow, you can:

Sync Amazon S3 to Snowflake in real time
Handle schema changes effortlessly
Scale without infrastructure headaches

Ready to go from raw files to real-time insights?

Try Estuary Flow for free and build your first streaming data pipeline today.

DEV Community