This article provides an in-depth look at the database schema used by Apache DolphinScheduler, which primarily persists workflow definitions, execution states, scheduling information, and system metadata. It supports a wide range of databases including MySQL, PostgreSQL, and H2. The schema definitions are located in the dolphinscheduler-dao/src/main/resources/sql directory.
Schema Architecture
The DolphinScheduler database schema is organized into seven major functional groups:
| Group | Purpose | Key Tables |
|---|---|---|
| Workflow Management | Stores workflow and task definitions with version control |
t_ds_workflow_definition, t_ds_task_definition, t_ds_workflow_task_relation
|
| Execution State | Tracks runtime instances and their states |
t_ds_workflow_instance, t_ds_task_instance, t_ds_command
|
| Scheduling | Manages cron-based schedules via Quartz |
t_ds_schedules, QRTZ_* tables |
| Resource Management | Metadata for data sources, files, and UDFs |
t_ds_datasource, t_ds_resources, t_ds_udfs
|
| Administration | Users, tenants, projects, and permissions |
t_ds_user, t_ds_tenant, t_ds_project
|
| Alerts | Alert configurations and history |
t_ds_alert, t_ds_alertgroup
|
| Service Registry | JDBC-based coordination (alternative to ZooKeeper) |
t_ds_jdbc_registry_* tables |
Workflow and Task Definition Model
Separation of Definition and Instance
DolphinScheduler strictly separates definitions (templates) from instances (executions), enabling version control, concurrent executions, and audit tracking.
Key Design Principles:
- Code-based Identifier: Workflows and tasks use a bigint code as a stable cross-version identifier.
-
Composite Key: Definitions use a composite natural key
(code, version). - Immutable Versions: Each version is immutable; modifications create a new version.
- Instance Reference: Instances reference a specific version of the definition.
Core Tables Reference
Workflow Definition Table
t_ds_workflow_definition
Primary table for workflow templates.
| Column | Type | Description |
|---|---|---|
| id | int | Auto-increment primary key |
| code | bigint | Unique workflow identifier (stable across versions) |
| version | int | Version number (default 1) |
| name | varchar(255) | Workflow name |
| project_code | bigint | Associated project |
| release_state | tinyint | 0 = offline, 1 = online |
| global_params | text | Global parameters in JSON format |
| execution_type | tinyint | 0 = parallel, 1 = serial wait, 2 = serial discard, 3 = serial priority |
| timeout | int | Timeout in minutes |
| user_id | int | Creator user ID |
Indexes:
UNIQUE KEY workflow_unique (name, project_code)UNIQUE KEY uniq_workflow_definition_code (code)KEY idx_project_code (project_code)
t_ds_workflow_definition_log
Stores audit logs for all workflow definition versions.
Mirrors t_ds_workflow_definition structure, with additional columns: operator, operate_time. Primary key: (code, version).
t_ds_task_definition
Reusable task template within workflows.
| Column | Type | Description |
|---|---|---|
| code | bigint | Unique task identifier |
| version | int | Version number |
| task_type | varchar(50) | Shell, SQL, Python, Spark, etc. |
| task_params | longtext | Task configuration in JSON |
| worker_group | varchar(255) | Target worker group |
| fail_retry_times | int | Retry attempts on failure |
| fail_retry_interval | int | Retry interval (minutes) |
| timeout | int | Task timeout (minutes) |
| cpu_quota | int | CPU limit (-1 = unlimited) |
| memory_max | int | Memory limit (MB, -1 = unlimited) |
t_ds_workflow_task_relation
Defines DAG structure by specifying edges between tasks.
| Column | Type | Description |
|---|---|---|
| workflow_definition_code | bigint | Parent workflow |
| workflow_definition_version | int | Workflow version |
| pre_task_code | bigint | Predecessor task (0 for root) |
| post_task_code | bigint | Successor task |
| condition_type | tinyint | 0 = none, 1 = conditional, 2 = delay |
| condition_params | text | Condition configuration in JSON |
Note: pre_task_code = 0 indicates a root node (no predecessor).
Execution State Tables
t_ds_workflow_instance
Runtime execution record of workflows.
| Column | Type | Description |
|---|---|---|
| id | int | Primary key |
| workflow_definition_code | bigint | References definition |
| workflow_definition_version | int | Locked version for this run |
| state | tinyint | 0=SUBMITTED,1=RUNNING,2=PAUSING_PREP,3=PAUSED,4=STOPPING_PREP,5=STOPPED,6=FAILURE,7=SUCCESS,8=NEED_FAULT_TOLERANCE,9=TERMINATED,10=WAITING,11=WAITING_DEP |
| state_history | text | State transition log |
| start_time | datetime | Execution start |
| end_time | datetime | Execution end |
| command_type | tinyint | 0=START,1=START_CURRENT,2=RECOVER,3=RECOVER_PAUSE,4=START_FAILURE,5=SUPPLEMENT,6=SCHEDULE,7=RE_RUN,8=PAUSE,9=STOP,10=RESUME_WAIT |
| host | varchar(135) | Main server executing workflow |
| executor_id | int | User triggering execution |
| tenant_code | varchar(64) | Tenant for resource isolation |
| next_workflow_instance_id | int | For serial execution mode |
Indexes:
KEY workflow_instance_index (workflow_definition_code, id)KEY start_time_index (start_time, end_time)
t_ds_task_instance
Runtime record for a single task.
| Column | Type | Description |
|---|---|---|
| id | int | Primary key |
| task_code | bigint | References task definition |
| task_definition_version | int | Locked version |
| workflow_instance_id | int | Parent workflow instance |
| state | tinyint | Same as workflow_instance |
| submit_time | datetime | Queued time |
| start_time | datetime | Actual start time |
| end_time | datetime | End time |
| host | varchar(135) | Worker host executing task |
| execute_path | varchar(200) | Working directory on worker |
| log_path | text | Log file path |
| retry_times | int | Current retry count |
| var_pool | text | Variables for downstream tasks |
Index: KEY idx_task_instance_code_version (task_code, task_definition_version)
Command Model and Workflow Execution
Command Queue
The t_ds_command table implements a queue-based execution model, where commands trigger workflow instances.
t_ds_command Structure
| Column | Type | Description |
|---|---|---|
| command_type | tinyint | 0=START,1=START_CURRENT,2=RECOVER,3=RECOVER_PAUSE,4=START_FAILURE,5=SUPPLEMENT,6=SCHEDULE,7=RE_RUN,8=PAUSE,9=STOP |
| workflow_definition_code | bigint | Target workflow |
| workflow_instance_id | int | For resume/re-run operations |
| workflow_instance_priority | int | 0=highest,1=high,2=medium,3=low,4=lowest |
| command_param | text | Execution parameters in JSON |
| worker_group | varchar(255) | Target worker group |
| tenant_code | varchar(64) | Tenant executing command |
| dry_run | tinyint | 0=normal,1=dry run (no actual execution) |
Process Flow:
- Commands are inserted via API, scheduler, or retry logic into
t_ds_command. -
MasterSchedulerThreadcontinuously scans the table (sorted by priority and id). - Master creates
t_ds_workflow_instancerecords. - DAG is analyzed, and
t_ds_task_instancerecords are created for ready tasks. - Successfully processed commands are removed; failed commands move to
t_ds_error_command.
Version Control System
Code-based Version Model
DolphinScheduler supports:
- Concurrent execution of different versions.
- Safe updates without affecting running instances.
- Full change audit tracking.
Version Management Rules
-
Current version table: Only the “current” version exists in
t_ds_workflow_definitionandt_ds_task_definition. -
Log tables: All versions are saved in
*_logtables withUNIQUE KEY (code, version). -
Online state: Only one version per code has
release_state = 1(online). - Instance lock: Workflow instances lock to a specific version upon creation.
- Immutable versions: Once referenced by an instance, logs are immutable.
Scheduling Architecture
Quartz Integration
DolphinScheduler integrates Quartz for cron-based scheduling. The schema includes standard Quartz tables and a mapping table.
t_ds_schedules
| Column | Type | Description |
|---|---|---|
| workflow_definition_code | bigint | Target workflow (unique) |
| start_time | datetime | Schedule start time |
| end_time | datetime | Schedule end time |
| timezone_id | varchar(40) | Timezone for cron expression |
| crontab | varchar(255) | Cron expression |
| release_state | int | 0=offline,1=online |
| failure_strategy | int | Behavior on failure |
| workflow_instance_priority | int | Default instance priority |
Quartz Table Notes:
-
QRTZ_TRIGGERS.NEXT_FIRE_TIME: Indexed for efficient scanning. -
QRTZ_CRON_TRIGGERS.CRON_EXPRESSION: Parsed cron definition. -
QRTZ_SCHEDULER_STATE: Tracks Quartz scheduler instance.
Resource and Configuration Tables
Data Source Management
t_ds_datasource
Stores database connection configs for SQL tasks.
| Column | Type | Description |
|---|---|---|
| name | varchar(64) | Data source name |
| type | tinyint | Database type (MySQL, PostgreSQL, Hive, etc.) |
| connection_params | text | JSON configuration (host, port, db, credentials) |
| user_id | int | Owner user |
Constraint: UNIQUE KEY (name, type) prevents duplicate sources.
File Resources
t_ds_resources (Deprecated)
Note: Deprecated; resource metadata is migrating to a separate backend.
| Column | Type | Description |
|---|---|---|
| full_name | varchar(128) | Full path including tenant |
| type | int | File type (file/UDF) |
| size | bigint | File size (bytes) |
| is_directory | boolean | Directory flag |
| pid | int | Parent directory ID |
Multi-Tenant and Administration
Project, User, and Tenant Hierarchy
Key Admin Tables
t_ds_tenant
| Column | Type | Description |
|---|---|---|
| tenant_code | varchar(64) | Unique tenant identifier |
| queue_id | int | Default YARN queue for tasks |
| description | varchar(255) | Tenant description |
Default tenant: System creates id=-1, tenant_code='default'.
t_ds_user
| Column | Type | Description |
|---|---|---|
| user_name | varchar(64) | Login username (unique) |
| user_password | varchar(64) | Hashed password |
| user_type | tinyint | 0=normal,1=admin |
| tenant_id | int | Associated tenant (default -1) |
| varchar(64) | Email address | |
| state | tinyint | 0=disabled,1=enabled |
t_ds_project
| Column | Type | Description |
|---|---|---|
| code | bigint | Unique project code |
| name | varchar(255) | Project name (unique) |
| user_id | int | Creator/owner |
| description | varchar(255) | Project description |
JDBC Registry
For deployments without ZooKeeper, DolphinScheduler provides a JDBC-based registry for service coordination.
t_ds_jdbc_registry_data
Stores registry entries similar to ZooKeeper nodes.
| Column | Type | Description |
|---|---|---|
| data_key | varchar(256) | Path-like key (unique) |
| data_value | text | Serialized data |
| data_type | varchar(64) |
EPHEMERAL or PERSISTENT
|
| client_id | bigint | Associated client |
| last_update_time | timestamp | Last modification |
t_ds_jdbc_registry_lock
Implements distributed locks.
| Column | Type | Description |
|---|---|---|
| lock_key | varchar(256) | Lock identifier (unique) |
| lock_owner | varchar(256) | Client holding lock (ip_processId) |
| client_id | bigint | Associated client |
t_ds_jdbc_registry_client_heartbeat
Tracks active clients to clean ephemeral data.
| Column | Type | Description |
|---|---|---|
| id | bigint | Client ID (primary key) |
| client_name | varchar(256) | Client identifier |
| last_heartbeat_time | bigint | Last heartbeat timestamp |
| connection_config | text | Connection metadata |
Cleanup: Expired heartbeats automatically remove temporary registry entries and locks.
Alert System
Alert Tables
t_ds_alert
Alerts generated on workflow/task failure or completion.
| Column | Type | Description |
|---|---|---|
| title | varchar(512) | Alert title |
| sign | char(40) | SHA1 hash for deduplication |
| content | text | Alert message body |
| alert_status | tinyint | 0=waiting,1=success,2=failure |
| warning_type | tinyint | 1=workflow success,2=workflow/task failure |
| workflow_instance_id | int | Source workflow instance |
| alertgroup_id | int | Target alert group |
Index: KEY idx_sign (sign) for deduplication.
t_ds_alertgroup
Alert channel group.
| Column | Type | Description |
|---|---|---|
| group_name | varchar(255) | Unique group name |
| alert_instance_ids | varchar(255) | Comma-separated plugin instance IDs |
| description | varchar(255) | Group description |
Indexes and Query Optimization
Key Indexes
Schema includes indexes for common query patterns:
- Workflow and Task Lookup:
- Workflow instances by definition:
`KEY workflow_instance_index (workflow_definition_code, id)`
- Task instances by definition:
`KEY idx_task_instance_code_version (task_code, task_definition_version)`
- Time range queries for monitoring:
`KEY start_time_index (start_time, end_time)`
- Command Processing:
Priority-based command scan:
`KEY priority_id_index (workflow_instance_priority, id)`
- DAG Relation Queries:
- Forward and backward DAG traversal:
`KEY idx_pre_task_code_version (pre_task_code, pre_task_version)`
`KEY idx_post_task_code_version (post_task_code, post_task_version)`
`KEY idx_code (project_code, workflow_definition_code)`
Unique Constraints
Database-level enforcement of key business rules:
| Table | Constraint | Purpose |
|---|---|---|
t_ds_workflow_definition |
UNIQUE (name, project_code) |
No duplicate workflow names per project |
t_ds_workflow_definition |
UNIQUE (code) |
Global workflow identifier |
t_ds_workflow_definition_log |
UNIQUE (code, version) |
One record per version |
t_ds_datasource |
UNIQUE (name, type) |
No duplicate data sources per type |
t_ds_schedules |
UNIQUE (workflow_definition_code) |
One schedule per workflow |
Schema Evolution and Upgrades
DolphinScheduler maintains upgrade scripts in dolphinscheduler-dao/src/main/resources/sql/upgrade for cross-version schema migration.
Recent Schema Changes
Version 3.3.0
- Renamed tables/columns from “process” to “workflow”.
- Removed data quality tables (
t_ds_dq_*). - Added JDBC registry to replace ZooKeeper.
- Removed cache-related columns from task tables.
Version 3.2.0
- Added
execution_typeto workflow definition (parallel/serial mode). - Added
next_workflow_instance_idfor serial execution chain. - Added
tenant_codeto command and instance tables. - Created
t_ds_project_parameterandt_ds_project_preference.
Database Interaction Patterns
Service Layer Access
Database access is abstracted via DAO layer in dolphinscheduler-dao.
Key Service Classes:
-
ProcessService: CRUD for workflow/task definitions and instances. -
CommandService: Command queue management. -
ProjectService: Project and permission management. -
ResourcesService: Resource metadata operations.
Transaction Management
Most operations use Spring @Transactional for atomicity:
- Create workflow instance and task instances atomically.
- Consume commands and generate instances.
- Synchronize version updates with log tables.
Connection Pooling
System uses HikariCP configured in
application.yaml:
- Default pool size: 50 connections.
- Connection timeout: 30 seconds.
- Idle timeout: 600 seconds.








Top comments (0)