DEV Community

Sujitha Rasamsetty
Sujitha Rasamsetty

Posted on

AWS Clean Rooms: End-to-End Analytics Collaboration Across Two AWS Accounts

Why AWS Clean Rooms, and why this PoC?

Most cross-company data collaborations fail at the same point: data sharing.

Even when both sides want the same outcome (insights), sharing raw data introduces compliance, security, and trust barriers. AWS Clean Rooms addresses that gap by letting multiple parties run approved analytics across their datasets while preventing raw data exposure.

In this post, I’ll show a simple, reproducible PoC where:

  • Two AWS accounts participate in a Clean Rooms collaboration
  • Each account contributes a small S3 dataset
  • Query access is governed by analysis rules
  • The analyst gets aggregated results only

This is intentionally a small PoC, designed to be easy to understand and easy to reproduce exactly the kind of “teachable build” that works well for community learning.


What we are building

The simplest useful story is: count customers by region, without letting anyone download or view raw rows.

The roles in this PoC:

  • Account A: “Data owner” — hosts a dataset in S3 and allows limited analysis through rules.
  • Account B: “Second participant” — joins the collaboration and contributes its dataset (or can be an analyst member depending on your setup).
  • AWS Clean Rooms: enforces collaboration boundaries and query-level privacy controls.

Even if the real world is more complex, this PoC covers the core fundamentals.


Prerequisites

  • Two AWS accounts (Account A and Account B)
  • Same region in both accounts (I used us-east-1 for simplicity)
  • S3 access in both accounts
  • AWS Clean Rooms access in both accounts
  • Athena will be used as the query engine (Clean Rooms integrates with Athena)

IAM quick checklist (minimum for a PoC)

In both accounts, ensure the user you’re logged in with has:

  • AWSCleanRoomsFullAccess
  • S3 permissions (at least to your PoC buckets)
  • Athena + Glue permissions (for catalog + querying)

For a PoC, using AWS managed policies is acceptable and helps you move quickly.


Step 1: Create the datasets (one per account)

Create two small CSV files.

Account A dataset (example: company_a_customers.csv)

customer_id,region
A001,India
A002,India
A003,USA
A004,UK
A005,USA

Enter fullscreen mode Exit fullscreen mode

Account B dataset (example: company_b_customers.csv)

customer_id,region
B101,India
B102,USA
B103,USA
B104,Canada
B105,UK
Enter fullscreen mode Exit fullscreen mode

Upload them to S3:

  • Account A bucket example: aws-clean-rooms-poc-org-a
  • Account B bucket example: aws-clean-rooms-poc-org-b


Step 2: Create the AWS Clean Rooms collaboration (Account A)

  1. Sign in to Account A.

  2. Navigate to:

    AWS Console → AWS Clean Rooms → Collaborations → Create collaboration

  3. Enter collaboration details:

    • Collaboration name: aws-clean-rooms-collaboration-poc
    • Description: Short, goal-oriented description

4.Add members:

  • Member 1: Account A (automatically added as the creator)
  • Member 2: Account B (must be a different AWS Account ID)

Each collaboration member must be a unique AWS account.

Duplicate account IDs are not allowed.

5.Configure collaboration abilities:

  • Enable Queries (SQL)
  • Do not enable Jobs or ML workflows for this PoC

6.Configure query cost responsibility:

  • Pay for queries: Account A

7.Create the collaboration.
8.Review and create:

  • Click Create Collaboration

Then, create the collaboration and join it by creating a membership immediately.

Why the “two accounts” requirement matters

Clean Rooms enforces collaboration boundaries at the account level, so each member must be in a unique AWS account. This is not optional; it’s the security boundary.


Step 3: Accept invitation and create membership (Account B)

In Account B:

  1. Open AWS Clean Rooms in the same region
  2. You should see a pending invitation under Collaborations
  3. Click Create membership to join

If you see "Access denied" or "needs subscription."

This usually means one of two things:

AWS Clean Rooms is not enabled/initialized in the account (first-time access)

Your IAM user does not have AWSCleanRoomsFullAccess

Fix: attach the policy to the IAM user you are logged in with (creating a role alone won’t help unless you assume it).


Step 4: Catalog S3 data (Glue) so Clean Rooms/Athena can query it

This step confuses many people at first, so I’ll explain clearly:

Do I need AWS Glue?

Yes, if your data source is S3 and you plan to query with Athena

Clean Rooms uses query engines like Athena. Athena queries tables defined in the Glue Data Catalog. So the S3 files must be represented as tables in Glue (schema matters).

You can create the Glue table either by:

  • creating a Glue table manually (fast for PoC), or
  • using a crawler (overkill for this PoC)

Create Glue table in Account A:

In Account A → AWS Glue → Databases / Tables:

  • Create database: cleanrooms_poc_db

  • Create table: company_a_customers

  • Location: s3://aws-clean-rooms-poc-org-a/

  • Format: CSV

  • Schema (must match your CSV):

  • customer_id (string)

  • region (string)

Repeat the same in Account B:

  • Database: cleanrooms_poc_db
  • Table: company_b_customers
  • Location: s3://aws-clean-rooms-poc-org-b-/
  • Schema:
  • customer_id (string)
  • region (string)

Important lesson: If a column is missing in Glue schema, you will get errors like:

Column 'customer_id' not found in any table

This is not a Clean Rooms bug; it's a catalog/schema mismatch.


Step 5: Create configured tables (Account A and Account B)

Configured tables are how Clean Rooms expose a dataset to the collaboration.

In Account A

1.AWS Clean Rooms → Tables → Configure new table
2.Data source: Amazon S3
3.Choose the Glue database/table you created:

  • cleanrooms_poc_db.company_a_customers

4.Columns available to collaboration:

  • For this PoC, allow both:
  • customer_id
  • region

5.Allowed analysis methods:

  • Direct query only
  • Do not enable jobs

6.Create configured table.

In Account B :
Repeat the same steps for:
- cleanrooms_poc_db.company_b_customers


Step 6: Associate configured tables with the collaboration

Still in each account:

  • Go to the configured table
  • Choose Associate to collaboration
  • Skip data access budget for now (optional feature; not needed for a basic PoC)
  • When you return to the collaboration view, you should see tables associated with both members.

If you only see Account A’s table, it means Account B hasn’t associated its table yet.


Step 7: Configure the analysis rule (this is the core of Clean Rooms)

Analysis rules define what is allowed.
For a community-friendly PoC, use Aggregation rules.
Recommended aggregation rule design

Goal: allow results like “count customers by region” while blocking raw output.

In the analysis rule UI:

1.Choose Aggregation
2.Query controls:

  • Allow COUNT (or COUNT_DISTINCT) on customer_id  3.Dimension controls:
  • Allow region as a dimension column
  • This is what makes region appear in SELECT/GROUP BY 4.Join controls:
    • If you want standalone aggregation, allow “queried by itself = Yes”
    • If you want overlap-only analytics, choose overlap-only (stricter)  5.Output constraints (aggregation thresholds):
    • Set minimum threshold (e.g., 2) This prevents returning results for very small groups (privacy guardrail)

Common error and the fix:
If you see an error like:

columns prohibited by aggregation analysis rule

It means your analysis rule doesn’t allow that column in the SELECT/PROJECT clause.

Fix it by ensuring:

  • region is included as a dimension column
  • aggregate function is defined on customer_id

This is exactly how Clean Rooms enforces safe query structure.


Step 8: Where to run queries

You can run queries from the AWS Clean Rooms analysis experience, not directly from Athena.

In the collaboration:

  1. AWS Clean Rooms → Collaborations → select your collaboration
  2. Go to Analysis (or “Run analysis / Queries”, depending on console)
  3. Choose the tables and run SQL there

This ensures the query is evaluated against Clean Rooms analysis rules and constraints.


Step 9: Example query

Once both tables are associated and analysis rules allow it, run:

SELECT
  region,
  COUNT(DISTINCT customer_id) AS customer_count
FROM company_a_customers_v2
GROUP BY region;
Enter fullscreen mode Exit fullscreen mode

Or, if you designed the collaboration to require overlap joins, your query will need to join Account A and Account B tables using approved join columns.

Either way, the result should be aggregated and controlled by thresholds.

What this PoC demonstrates:

  • This simple PoC proves several important points:
  • Collaboration is enforced at the AWS account boundary
  • Data stays in each owner’s S3 bucket; what changes is what can be computed
  • Glue/Athena catalog integration matters (schema correctness is critical)

  • Analysis rules are the real power:

    • They control columns
    • They control query structure
    • They control privacy thresholds

This is not “sharing data safely.”
It is “designing a system where raw data sharing is not required.”


Closing

If you’re learning or teaching privacy-first data collaboration patterns on AWS, Clean Rooms is worth investing time into. The service is less about analytics features and more about architectural trust boundaries.

If you build this PoC, feel free to adapt the dataset and analysis rules to your domain (ad measurement, fraud, healthcare research). The core idea remains the same.

Top comments (0)