Got a bunch of raw data sitting in Amazon S3 and need to get it into Snowflake for analytics — fast? You’re not alone.
Maybe it’s JSON logs, CSV exports, or event data piling up in your S3 bucket. Maybe you’ve tried batch pipelines or custom scripts but ran into delays, duplicates, or schema chaos. What you actually need is a clean, reliable way to load that S3 data to Snowflake, without spending weeks building and maintaining it.
That’s exactly what Estuary Flow is built for.
Flow makes it easy to build real-time S3 to Snowflake data pipelines with no code, no ops overhead, and no latency headaches. It connects directly to your S3 bucket, picks up new files as they arrive, and keeps your Snowflake warehouse in sync continuously.
In this walkthrough, we’ll show you how to set up an Amazon S3 to Snowflake pipeline using Estuary Flow from start to finish. You’ll go from raw files to live Snowflake tables in just a few steps.
TL;DR: If you're looking to stream data from Amazon S3 to Snowflake, you're in the right place — and Flow makes it a breeze.
Why Stream S3 Data to Snowflake in Real Time?
Let’s be honest — batch processing worked fine back when dashboards only needed to update once a day. But today, teams expect real-time answers: marketing needs up-to-the-minute campaign performance, operations teams need live inventory data, and product managers want to react to user behavior as it happens.
That’s where streaming data from S3 to Snowflake changes the game.
If you’re storing raw files — like logs, events, or exports — in Amazon S3, you’re already halfway there. The missing piece is a low-latency pipeline that gets that data into Snowflake the moment it arrives. No waiting for hourly jobs. No stale reports. Just fresh, query-ready data flowing in 24/7.
Here are a few reasons real-time sync matters:
- Analytics that actually keep up – Get real-time insights instead of reading yesterday’s data.
- Automation that reacts fast – Trigger workflows in Snowflake based on live S3 updates.
- Simplified ops – Eliminate brittle scripts, manual backfills, and sync delays.
Note: Since Amazon S3 doesn’t support native change notifications, Flow polls your bucket every few minutes to detect new files, then streams them to Snowflake immediately. It’s batch under the hood, but real-time in effect.
Why Use Estuary Flow Instead of Traditional ETL Tools?
If you’ve tried to move data from Amazon S3 to Snowflake before, you probably know the drill: patch together an ETL tool, deal with scheduling, wrestle with schema mismatches, and hope the job doesn’t break halfway through.
The thing is, most ETL tools were built for a different era — one where “real time” meant “hourly,” and everything ran in batches. Estuary Flow flips that on its head.
Here’s how Flow makes your S3 to Snowflake pipeline way easier:
- Real-Time by Default: Flow isn’t just fast — it’s built for continuous streaming. Once you connect your S3 bucket, Flow automatically picks up new files as they land and streams the data directly into Snowflake.
- No Code Required: Set up everything — capture, schema, and materialization — through a clean UI. You don’t need to write Python, wrangle Airflow, or babysit cron jobs.
- Schema-Aware + Smart: Flow infers the structure of your S3 data and helps you map it to Snowflake tables. You can tighten up schemas, apply transformations, and evolve structure over time without breaking your pipeline.
- Exactly-Once Delivery: No duplicates. No reprocessing. Flow uses cloud-native guarantees to ensure data lands in Snowflake exactly once, even if things get weird.
- Built to Scale: Whether you're syncing a few JSON files or streaming terabytes a day, Flow scales automatically without locking you into complex infrastructure.
Estuary Flow takes the friction out of real-time data integration from S3 to Snowflake, so you can focus on using the data, not moving it.
What You Need to Get Started
You don’t need much to build an Amazon S3 to Snowflake pipeline with Estuary Flow — just a few basics:
Estuary Flow Account
Sign up for free to access the Flow web app — no downloads or setup required.
Amazon S3 Bucket
This is your data source. You’ll need:
- Bucket name & region
- Either public access or your AWS access key + secret key
- (Optional) A folder path, called a “prefix”
Snowflake Account
Your destination for the data. Make sure you have:
- A database, schema, and virtual warehouse
- A user with access
- Your account URL + login credentials
- (Optional) warehouse name and role
That’s it. With these in place, you’re ready to connect the pieces and start streaming.
Step 1: Capture Data from Amazon S3
First up, you’ll connect Estuary Flow to your S3 bucket — this step is called a capture. It’s how Flow knows where to pull your data from.
Here’s how to set it up:
- Log into Estuary Flow at dashboard.estuary.dev.
- Click the Sources tab and select New Capture.
- Choose Amazon S3 from the list of connectors.
You’ll see a form where you enter your S3 details:
- Capture name – Something like myorg/s3-orders
- AWS credentials – Only needed if your bucket isn’t public
- Bucket name & region – From your S3 console
- Prefix (optional) – To pull from a specific folder
- Match keys (optional) – For filtering files, like *.json
Once you click Next, Flow will connect to your bucket and auto-generate a schema based on your data. You’ll see a preview of your Flow collection — this acts as a live copy of your S3 data inside Flow.
Click Save and Publish to finish the capture.
Behind the scenes, Flow checks your S3 bucket on a 5-minute schedule (by default) to pick up new or updated files. This is how it delivers near-real-time sync, even though S3 itself doesn’t support streaming events.
Next, let’s connect this to Snowflake.
Step 2: Materialize to Snowflake
Now that your data is flowing into Estuary, it’s time to materialize it to Snowflake — in other words, stream it directly into a Snowflake table.
Here’s how to set it up:
- After saving your S3 capture, click Materialize Collections.
- Choose the Snowflake connector from the destination list.
You’ll fill out a simple form with your Snowflake details:
- Materialization name – e.g., myorg/s3-to-snowflake
- Account URL – Like myorg-account.snowflakecomputing.com
- User + Password – A Snowflake user with the right permissions
- Database & Schema – Where the table will live
- Warehouse – Optional, but recommended
- Role – Optional if already assigned to the user
Once Flow connects, you’ll see your captured collection (from S3) listed.
From here, you can:
- Rename the output table
- Enable delta updates (if you want changes applied instead of full inserts)
- Use Schema Inference to map your flat S3 data into Snowflake’s tabular format
To do that:
- Click the Collection tab
- Select Schema Inference
- Review the suggested schema → Click Apply
Finally, hit Save and Publish.
✅ That’s it — you’ve now got a fully working, real-time S3 to Snowflake pipeline. Flow will continuously sync new files from your bucket straight into your Snowflake warehouse.
What’s Next? Supercharge Your S3 to Snowflake Pipeline
You now have a fully operational, real-time pipeline from Amazon S3 to Snowflake — and it runs continuously, no scripts or schedulers required.
But that’s just the beginning.
With Estuary Flow, you can take things even further:
Add Transformations (a.k.a. Derivations)
Want to clean, filter, or join your data before it lands in Snowflake? Use derivations to apply real-time transformations using SQL or TypeScript, right inside Flow.
You can enrich JSON objects, flatten nested structures, or create entirely new views.
Plug into More Systems
Need to send the same S3 data to BigQuery, Kafka, or a dashboard tool? Just add another materialization — Flow supports multi-destination sync out of the box.
Monitor + Optimize
Use Flow’s built-in observability tools or plug into OpenMetrics to monitor throughput, schema evolution, and pipeline health in real time.
Start Streaming S3 Data to Snowflake Today
The old way — batch jobs, manual scripts, clunky ETL — just can’t keep up with today’s speed of data.
With Estuary Flow, you can:
- Sync Amazon S3 to Snowflake in real time
- Handle schema changes effortlessly
- Scale without infrastructure headaches
Ready to go from raw files to real-time insights?
Try Estuary Flow for free and build your first streaming data pipeline today.
Top comments (0)