Salesforce Data 360 Zero Copy: How It Actually Works

#salesforce #datacloud #data360 #cloudcomputing

Salesforce Data 360 Zero Copy: How It Actually Works

If you've spent any time in Salesforce conversations this year, you've probably heard "Zero Copy" thrown around like everyone already knows what it means. I've sat in meetings where architects nodded along confidently, only to corner me afterward and ask, "wait, so are we still copying the data or not?"

Fair question. The marketing language around Data 360 (the new name for Data Cloud) doesn't always make it obvious what's happening under the hood. So in this post, I want to walk through what Zero Copy actually does, when it makes sense to use it, and a few gotchas I've run into when integrating Data 360 with Snowflake and Databricks.

If you stumble on a term you don't recognize while reading this, salesforcedictionary.com has a pretty solid glossary that I bookmark for quick lookups during projects.

What Zero Copy Actually Means

Let's get the definition out of the way. Zero Copy is a federation pattern that lets Data 360 query data sitting in external warehouses like Snowflake, Databricks, Google BigQuery, or Amazon Redshift without first ingesting it into Data 360 storage.

That's the technical version. The plain version: instead of running a nightly ETL job to drag a 200-million-row orders table from Snowflake into Data 360, you just point Data 360 at the table. When someone runs a segmentation or insight that needs that data, Data 360 reaches across, pulls what it needs in real time, and returns the result.

A lot of folks ask if "Zero" really means zero. Mostly yes, but not always. Some Zero Copy patterns push the query down to the source warehouse and bring back only the result set. Others materialize a small subset on the Data 360 side for performance. The federation method you pick determines which one happens, and that decision matters more than the marketing makes it sound.

The Three Federation Patterns

Salesforce documents three flavors of Zero Copy, and they each trade off freshness, performance, and cost differently. I've used all three on different projects, and the choice usually comes down to how the source data behaves.

The first is direct query federation. Data 360 sends the query straight to Snowflake or Databricks, the source warehouse runs it, and the results come back. No data lands in Data 360 at all. This is great for low-volume, low-latency use cases like an operational dashboard that needs to reflect what happened ten seconds ago. The downside is you're paying compute on the source warehouse every single time the query runs. If you have a popular dashboard, that bill adds up quickly.

The second is Apache Iceberg-based federation. Data 360 reads Iceberg tables directly from cloud storage, and either the source warehouse or Data 360 itself can serve as the query engine. This is the sweet spot for analytical workloads where you want the data fresh-ish (maybe a few minutes old) but you don't want to keep paying source warehouse compute for every query.

The third is outbound sharing, which flips the direction. You expose enriched Data 360 tables back to Snowflake or Databricks so analytics teams can query unified profiles or calculated insights without copying that data into their own warehouse. I've seen this used a lot when a customer's data science team lives entirely in Databricks and just wants access to clean, harmonized customer profiles built in Data 360.

When Zero Copy Is Actually The Right Move

Here's where I'll push back a little against the hype. Zero Copy isn't always the right pattern. I've watched teams default to it because it sounds modern, then end up with worse performance and higher costs than a traditional ingestion would have given them.

Zero Copy shines when:

The source data is huge and updated constantly. If you have a clickstream table in Databricks that gets a million rows an hour, copying it into Data 360 makes no sense. Federate it and let Databricks handle the heavy lifting.

The data needs to stay where it is for governance reasons. Some companies have data residency rules that won't let certain data leave specific regions or platforms. Zero Copy keeps it in place while still making it usable.

You only need a slice of the data. If your segmentation only ever touches purchases from the last 90 days, federating beats ingesting five years of history you'll never use.

You want to avoid duplicate truth. The classic problem with copying data is that the moment you copy it, both copies start drifting. Different teams update different versions, and reconciling becomes a nightmare. Zero Copy sidesteps that.

On the other hand, traditional ingestion still wins when:

The source data is slow or unreliable. Federating against a warehouse that takes 30 seconds to return basic queries will make every Agentforce action that depends on it feel broken.

You need the data offline or for backup-style use cases. Zero Copy is live-only. Lose connectivity to Snowflake and your federated data is gone.

You're running tight loops. If a Flow needs to read the same record fifty times in a single execution, ingesting once and caching is way faster than fifty federated queries.

How To Actually Set It Up

The setup is more straightforward than I expected the first time I did it, though there are a few steps that aren't obvious.

For Snowflake, you'll start by creating a Snowflake account in Data Cloud Setup, then authenticating with either a service account or OAuth. Once that's connected, Data 360 can browse the Snowflake schemas you've granted it access to. You pick the tables you want available and Data 360 creates a Data Lake Object pointing at each one. From there you map fields, set the primary key, and the table is queryable from Data 360 like any native object.

Databricks works similarly but uses Unity Catalog under the hood. You'll authenticate, point Data 360 at the catalogs and schemas you want, and Data 360 will read the Delta tables (which are Iceberg-compatible thanks to the Uniform feature). One thing worth knowing: if the Databricks tables aren't already exposed through Unity Catalog with proper permissions, you'll be debugging access errors for a while. Get the Databricks admin involved early.

Google BigQuery and Amazon Redshift follow the same shape, though Redshift has some quirks with how it handles federated queries that can make some operations slower than you'd expect.

Real Gotchas From The Field

A few things I wish someone had told me before my first Zero Copy project.

Latency is real and not always small. A federated query that takes 800 milliseconds in isolation might take 4 seconds when it's part of an Agentforce action that chains three other lookups. Test the user-facing experience, not just the query timing.

Cost attribution gets murky. When Data 360 federates to Snowflake, the compute hits the Snowflake bill, not the Salesforce one. Your Salesforce admin sees fast queries; your data team sees their warehouse bill creep up. Have that conversation before launch, not after.

Calculated insights and segments behave differently. Some Data 360 features work with federated data exactly like they would with ingested data. Others have restrictions. Check the documentation for whatever feature you're using - I've been burned by assuming a feature works the same and finding out it doesn't support federated sources at all.

Schema changes upstream will break you. If your Snowflake team renames a column, your Data 360 mappings will silently fail. Build a process to communicate schema changes across both teams. This isn't a Zero Copy problem specifically, but federation makes it more painful because there's no ingestion job that fails loudly to tip you off.

If terms like Data Lake Object, Calculated Insight, or Unified Profile are tripping you up, salesforcedictionary.com has plain-language definitions for most of them. I find it useful when onboarding new people to a Data 360 project.

What This Means For The Average Salesforce Team

If you're an admin or developer who hasn't touched Data 360 yet, Zero Copy probably isn't the first thing you need to understand. Get comfortable with the basics first - Data Streams, DMOs, Identity Resolution, that kind of thing. But once your team starts asking how to bring in data from the warehouse without doubling your storage costs, Zero Copy is the answer you'll want in your back pocket.

The trend across the platform is clear: data should live where it makes the most sense and be accessible from wherever it's needed. Zero Copy is Salesforce's bet that nobody wants to keep moving petabytes around. So far, that bet looks right.

I'm curious what other folks are running into with Zero Copy in production. Are you seeing the latency hit I mentioned, or has your experience been smoother? Drop a comment and let me know what your setup looks like. And if there are other Data 360 concepts you'd like me to break down, I'm taking requests.

Want a quick reference for any term in this post? salesforcedictionary.com keeps the definitions short and project-focused, which I appreciate when I'm in the middle of something and just need a refresher.