[Databricks on AWS #2] RBAC on Databricks: Function-Role Groups, Workspace Assignment, and Why USER/ADMIN Isn't the Whole Story

#databricks #aws #rbac #terraform

📚 Series: Databricks on AWS (Part 2)

Building a Databricks AI Platform on AWS

RBAC with Function-Role Groups ← you are here

Compute Governance: Pools, Policies, Clusters

The BOOTSTRAP_TIMEOUT Mystery

Fixing It with AWS PrivateLink

How We Structure the Terraform

Part 1 built the environment. Now we hand out the keys — three account-level groups, two workspaces, and a permission model that's mostly not something you invent.

Here's the trap most Databricks RBAC posts fall into: they treat access control like a thing you design from scratch. You don't. Databricks already hands you USER and ADMIN at the workspace level, entitlements, object ACLs, and Unity Catalog grants — all built in. The only piece you actually create is the groups. Get that mental split right and RBAC stops feeling like a maze.

If you're standing up a fresh Databricks account and wondering where to draw the first lines, this is the layer to get right before you touch a single catalog.

The model in one line

Everything flows through groups:

User ──▶ Function-Role group ──▶ workspace (USER / ADMIN) + (later) data grants

A user gets nothing directly. They land in a function-role group, and the group carries the permissions. That indirection is the whole point — it means "who is this person" and "what can this role do" are two separate problems you can solve independently.

We started with the smallest set that still maps to real jobs:

Group	Intent	Workspace level
`ai_admin`	Platform admins — run the place	ADMIN
`ai_engineer`	ML / data engineers — build things	USER
`ai_analyst`	Analysts — read and query	USER

Three groups. Not thirty. You can add ai_scientist, ai_guest, whatever later — each is one line of YAML plus an assignment. Resist the urge to pre-build a role for every hypothetical persona; churn kills that plan fast.

These are account-level groups, not workspace-local ones. That matters: one group definition can be assigned to many workspaces, which is exactly what you want when you have more than one.

Groups are the only thing you create

This is the part worth internalizing. Line up the permission layers and mark who owns each:

Layer	Values	Who defines it
Workspace assignment	`USER` / `ADMIN`	Databricks built-in
Entitlements	`workspace_access`, `databricks_sql_access`, `allow_cluster_create`, ...	Databricks built-in
Object ACLs	`CAN_MANAGE` / `CAN_USE` / `CAN_ATTACH_TO` / ...	Databricks built-in
Unity Catalog grants	`USE CATALOG` / `SELECT` / `MODIFY` / `ALL`	Databricks built-in
Function-role groups	`ai_admin`, `ai_engineer`, ...	You

Four of the five rows are Databricks primitives. You don't design SELECT or CAN_MANAGE — they exist. What you design is the subjects: the groups those permissions attach to. Everything else in this series (entitlements in Part 3, Unity Catalog grants later) is you handing built-in permissions to the groups you made here.

So your IaC surface for RBAC is genuinely small. You define the groups in Terraform, and you wire them to workspaces. Membership — the actual humans — lives somewhere else. More on that in a second.

USER vs ADMIN: built-in, and you just pick

Once the groups exist, you assign each one to a workspace at a permission level. Databricks gives you exactly two at this layer:

ADMIN — full workspace admin. Manage users, clusters, settings, everything.
USER — can log in and work, no admin surface.

That's it. You're not inventing a permission scheme; you're choosing which of two built-in levels each group gets, per workspace. In Terraform this is databricks_mws_permission_assignment — an account-level resource (multi-workspace mode) that maps a group to a workspace at a level.

With two workspaces — call them landing and pipeline — the matrix is small and readable:

Group	landing	pipeline
`ai_admin`	ADMIN	ADMIN
`ai_engineer`	USER	USER
`ai_analyst`	USER	USER

Admins are admins everywhere; engineers and analysts are users everywhere. The point isn't the specific grid — it's that a whole workspace access policy is this compact when the subjects are groups instead of individuals.

Where membership actually lives (hint: not IaC)

Here's the decision that surprises people: user-to-group membership does not go in Terraform.

It's tempting. You've got groups in code, you've got assignments in code — why not list the members in code too? Because joiners and leavers churn constantly, and every one of them would be a pull request, a plan, an apply. You'd be running infrastructure deploys to onboard an intern. That's the wrong tool.

So membership lives in the Account Console / SCIM, managed by ops (or synced from your IdP):

Account Console → User management → Users → Add user (company email = login ID).
Open the target group → Members tab → Add members.

One gotcha worth calling out: add people on the Members tab, not the Roles tab. The Roles tab is account-level roles (account admin, etc.) — a completely different thing, and easy to click by accident. And if you have SSO/SCIM provisioning, let the IdP own membership; manual adds will fight the sync.

The clean split, then:

In IaC (rarely changes): group definitions, workspace assignments, and later the grants.
Out of IaC (changes daily): who's in each group.

Structure in code, people in the console. That line is the single most useful RBAC decision in this whole post.

The apply-order gotcha that eats an afternoon

Terraform (via Terragrunt) resolves this in two stacks: one that creates the groups, one that creates the workspace assignments. The assignment stack depends on the group stack's output — it needs the group IDs to bind them to workspaces.

Here's the trap. If you apply the assignment stack before the groups exist, it doesn't error. It quietly resolves the dependency to a mock (empty) output and produces:

assignments = {}

An empty result. No groups, no bindings, no complaint. You think you shipped RBAC; you shipped nothing. Then you spend an hour wondering why nobody can log in.

The order is non-negotiable:

# 1. groups FIRST
atlantis apply -d .../groups

# 2. THEN the assignment (references group IDs)
atlantis apply -d .../workspace/workspace-assignment

If you ever see assignments = {} on a plan you expected to be full, this is why: the group output wasn't there yet, the dependency fell back to its mock, and the plan built against thin air. Apply groups, then re-plan. It's the RBAC cousin of "create the table before you grant on it."

Takeaways

Groups are the only thing you invent. USER/ADMIN, entitlements, ACLs, and Unity Catalog grants are all Databricks built-ins — you attach them, you don't design them.
Function-role groups are account-level, so one definition assigns cleanly to many workspaces. Start with three (admin/engineer/analyst); add more as one-liners when you actually need them.
USER vs ADMIN is a built-in binary at the workspace layer — pick per group, per workspace, and let the group indirection keep the matrix tiny.
Membership belongs in the console/SCIM, not IaC. Joiner/leaver churn would turn onboarding into infrastructure deploys. Structure in code, people out of code.
Create groups before workspace assignment. Do it backwards and the dependency resolves to a mock, assignments = {} ships silently, and nobody can log in.

With the subjects in place and the workspaces handed out, the next question is what those users are actually allowed to do with compute — who can spin up a cluster, which entitlements gate SQL, and how you keep costs from getting out of hand. That's Part 3.

Next: Compute governance — entitlements, cluster policies, and keeping a self-serve platform from becoming a self-serve bill.