Chen Debra

Posted on Mar 27

Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know

#dolphinscheduler #opensource #datascience #ai

In Apache DolphinScheduler, multi-tenancy is not just an “auxiliary permission feature,” but the core execution model of the scheduling system. What it truly solves is not “who can use the system,” but:

Under what identity tasks run, what resources they consume, and how to prevent interference between them

Only by understanding this can we grasp the essence of DolphinScheduler’s multi-tenant design.

What Are Single-Tenant and Multi-Tenant?

First, let’s clarify what single-tenant and multi-tenant mean.

In enterprise scheduling platforms, how different teams or business units share platform resources is a fundamental design concern. Single-tenancy and multi-tenancy are two common models, with clear differences in resource isolation, stability, and scalability. Understanding these differences helps organizations choose the right architecture for efficient and controllable scheduling.

A single-tenant system serves only one team or business unit. All tasks share the same execution environment, resource pool, and permission system.

A multi-tenant system, on the other hand, allows multiple teams to share one platform. Each team is logically isolated as an independent Tenant and mapped to underlying execution identities (Linux users), resource queues (YARN queues), or cloud-native namespaces (Kubernetes namespaces), enabling independent management of tasks and resources.

Compared with single-tenancy, multi-tenancy provides significant advantages in resource isolation, stability, and scalability. While single-tenancy is simple to deploy and manage, resource contention and task interference become inevitable as the number of users grows. Multi-tenancy avoids this by clearly isolating Tenants and assigning dedicated resource pools per team or environment.

Core Mechanism: Tenant-Centric Execution Model

To overcome the limitations of single-tenancy, Apache DolphinScheduler adopts a multi-tenant design.

At the heart of this design is a single concept: Tenant.

However, a Tenant is not just a logical label—it is an execution context container. When a task is scheduled, the system determines three key aspects based on the Tenant:

1. Execution Identity

Tasks do not run abstractly on Worker nodes; they must run as a specific OS user. A Tenant is bound to a Linux user, and tasks execute under that identity, inheriting file permissions and system-level isolation.

Example: Executing tasks as a Linux user

# Switch to the Linux user corresponding to the Tenant
sudo su - team_alpha_user

# Execute workflow task
spark-submit --class com.example.Job /opt/jobs/job.jar

Description: Tenant is bound to an OS user, and tasks run under this identity on Worker nodes, achieving file permission and environment isolation.
Tip: Ensure each Tenant has an independent home directory to avoid unauthorized access. ### 2. Resource Ownership

When tasks are submitted to engines like Spark or Flink, they must enter a resource pool. The Tenant determines the target resource queue or namespace, ensuring controlled resource usage.

Example: Create a Tenant and bind a YARN Queue

curl -X POST http://dolphinscheduler-api:12345/tenants \
  -H "Content-Type: application/json" \
  -d '{
        "name": "team_alpha",
        "queue": "team_alpha_queue",
        "description": "Team Alpha Tenant"
      }'

Description: Each Tenant corresponds to a YARN Queue or K8s Namespace, ensuring exclusive resource usage.
Tip: After creating a Tenant, remember to configure the queue or namespace in the resource scheduling system. ### 3. Isolation Boundary

Tenant defines a clear boundary for data access, task execution, and resource usage, forming logical isolation between teams.

Together, these three aspects form the foundation of DolphinScheduler’s multi-tenant mechanism.

How Resource Isolation Is Achieved

Multi-tenancy alone at the scheduling layer is not enough. The key design of DolphinScheduler is mapping Tenants to real underlying resource systems.

YARN-Based Isolation

In traditional big data architectures, Tenants are mapped to YARN queues. Each Tenant corresponds to a queue with defined capacity and limits. Tasks are submitted with queue information and scheduled accordingly, preventing resource contention.

YARN Mapping Example:

Queue configuration

<queue name="team_alpha_queue">
  <capacity>30</capacity>
  <maximum-capacity>50</maximum-capacity>
  <user-limit-factor>1.0</user-limit-factor>
</queue>

Description: Tasks automatically enter the queue when submitted, avoiding resource conflicts between Tenants.
Tip: Capacity and maximum capacity can be dynamically adjusted based on team workload.

Even if one team submits a large number of tasks, it only consumes resources within its own queue.

Kubernetes-Based Isolation

In cloud-native environments, Tenants are mapped to Kubernetes namespaces. Tasks run as Pods, and:

ResourceQuota limits total resource usage
LimitRange restricts per-task resource consumption

apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    cpu: "20"
    memory: "64Gi"
    pods: "50"

Description: Limits total resources and number of Pods to achieve cloud-native isolation.
Tip: Combine with LimitRange to control per-task resource limits and prevent a single task from monopolizing resources.

This approach isolates not only resources but also runtime environments and networking.

OS-Level Isolation

At the execution layer, Linux users provide the final isolation boundary. Even on the same machine, tasks from different Tenants cannot access each other’s files or scripts.

End-to-End Execution Flow

Putting everything together, the execution flow looks like this:

A workflow is triggered in DolphinScheduler
The system determines the Tenant
The Master assigns tasks to Workers
Workers switch to the corresponding Linux user
Tasks are submitted with resource metadata (YARN queue / K8s namespace)
Tasks run within the assigned resource pool under defined limits

This creates full isolation from scheduling logic to resource execution.

Technical Architecture

The architecture can be understood in three layers:

Top Layer: DolphinScheduler (Tenant / Workflow)
Middle Layer: Mapping (Linux User / YARN Queue / K8s Namespace)
Bottom Layer: Resource systems (Compute nodes / Big data clusters / Kubernetes clusters)

The key idea is:

The scheduling layer does not directly manage resources—it controls them through Tenant mapping

Why This Design Works in Enterprises

This design becomes especially powerful in enterprise environments.

When multiple teams share a platform, resource contention is inevitable. Without Tenant-to-resource mapping, a high-load workload could impact the entire system. With proper isolation, each team operates within its own boundaries.

It also simplifies troubleshooting. Issues can be traced to a specific Tenant and then to its corresponding resource pool, without affecting the entire system.

Most importantly, the design is highly scalable. Adding new teams or integrating new compute engines only requires extending Tenant mappings, without redesigning the scheduling system.

Summary

DolphinScheduler’s multi-tenant design is essentially a way to embed the scheduling system into the resource ecosystem. Instead of relying on complex logic, it leverages operating systems, resource schedulers, and container platforms to build a stable, clear, and controllable execution model.

For engineers, the real focus is not: