DEV Community

Chen Debra
Chen Debra

Posted on

1 1 1

【Tips】DolphinScheduler Task Data Cleanup and Backup Strategy to Ensure Page Smoothness

Problem Description

As Apache DolphinScheduler runs for an extended period, the number of tasks increases. The task data is stored in the database's t_ds_task_instance and t_ds_process_instance tables. The continuous data growth in these two tables leads to system page lag.

img

Solution

To address the above issue, the measure taken is to regularly clean up data in the t_ds_process_instance and t_ds_task_instance tables that is more than one month old.

Data Backup

Before cleaning up the data, first back up the original table data to ensure data safety.

use dolphinscheduler;
-- Create backup tables t_ds_process_instance_backup20241120 and t_ds_task_instance_backup20241120
CREATE TABLE t_ds_process_instance_backup20241120 LIKE t_ds_process_instance;
CREATE TABLE t_ds_task_instance_backup20241120 LIKE t_ds_task_instance;
-- Backup the original table data into the corresponding backup tables
INSERT INTO t_ds_process_instance_backup20241120
SELECT * FROM t_ds_process_instance;
INSERT INTO t_ds_task_instance_backup20241120
SELECT * FROM t_ds_task_instance;
Enter fullscreen mode Exit fullscreen mode

Check Backup Status

To ensure the backup operation is successful, you can check the data row count of the backup tables and the original tables.

-- Check the data row count of the backup tables
SELECT COUNT(*) FROM t_ds_process_instance_backup20241120;
SELECT COUNT(*) FROM t_ds_task_instance_backup20241120;
-- Check the data row count of the original tables
SELECT COUNT(*) FROM t_ds_process_instance;
SELECT COUNT(*) FROM t_ds_task_instance;
Enter fullscreen mode Exit fullscreen mode

Data Cleanup

After the backup is completed and confirmed to be error-free, perform the cleanup operation to delete data before October 19, 2024, 23:59:59.

-- Delete data before October 19, 2024, 23:59:59 in the t_ds_task_instance table
DELETE FROM t_ds_task_instance
WHERE submit_time < '2024-10-19 23:59:59';
-- Delete data before October 19, 2024, 23:59:59 in the t_ds_process_instance table
DELETE FROM t_ds_process_instance
WHERE end_time < '2024-10-19 23:59:59';
Enter fullscreen mode Exit fullscreen mode

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay