DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Databricks Workspace Toolkit

Databricks Workspace Toolkit

Automate Databricks workspace management — clusters, jobs, secrets, Unity Catalog, and permissions.

Stop clicking through the UI. Manage your entire Databricks workspace programmatically with production-ready Python wrappers around the Databricks REST APIs.


What You Get

  • Workspace management — List, create, delete, import/export notebooks programmatically
  • Cluster automation — Create clusters, resize, manage pools, enforce auto-termination policies
  • Job orchestration — Create multi-task workflows, manage schedules, configure notifications
  • Secret management — Create scopes, store secrets, manage ACLs for secure credential handling
  • Unity Catalog setup — Bootstrap catalogs, schemas, tables, grants, and external locations
  • Permissions manager — Configure RBAC for clusters, jobs, notebooks, and SQL warehouses
  • Cluster policies — Cost control templates with instance type restrictions and spot pricing
  • Job templates — Ready-to-use ETL and ML training job definitions
  • Shell scripts — Bootstrap workspace setup and backup/export notebooks
  • Admin dashboard — Databricks notebook showing cluster usage, job status, and costs

File Tree

databricks-workspace-toolkit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│   ├── workspace_manager.py       # Notebook CRUD, import/export
│   ├── cluster_manager.py         # Cluster lifecycle, pools, policies
│   ├── job_manager.py             # Jobs API: create, run, notifications
│   ├── secret_manager.py          # Secret scopes and ACLs
│   ├── unity_catalog_setup.py     # UC bootstrap: catalogs, schemas, grants
│   └── permissions_manager.py     # RBAC for workspace resources
├── configs/
│   ├── cluster_policies.json      # Cost control cluster policies
│   ├── workspace_config.yaml      # Environment configuration
│   └── job_templates/
│       ├── etl_job.json           # Multi-task ETL pipeline job
│       └── ml_training_job.json   # ML training with GPU cluster
├── scripts/
│   ├── setup_workspace.sh         # Bootstrap workspace setup
│   └── export_workspace.sh        # Backup notebooks and configs
├── notebooks/
│   └── admin_dashboard.py         # Admin overview dashboard
└── guides/
    └── workspace-management.md    # Best practices guide
Enter fullscreen mode Exit fullscreen mode

Getting Started

1. Configure Your Workspace

# configs/workspace_config.yaml
workspace:
  host: "https://adb-1234567890.12.azuredatabricks.net"
  token_env_var: "DATABRICKS_TOKEN"
Enter fullscreen mode Exit fullscreen mode

2. Create a Cluster

from src.cluster_manager import ClusterManager

mgr = ClusterManager.from_config("configs/workspace_config.yaml")
cluster_id = mgr.create_cluster(
    name="etl-cluster-prod",
    node_type_id="Standard_DS3_v2",
    num_workers=4,
    auto_terminate_minutes=30,
    spark_conf={"spark.sql.shuffle.partitions": "200"},
)
Enter fullscreen mode Exit fullscreen mode

3. Create a Multi-Task Job

from src.job_manager import JobManager

mgr = JobManager.from_config("configs/workspace_config.yaml")
job_id = mgr.create_job_from_template("configs/job_templates/etl_job.json")
mgr.run_now(job_id)
Enter fullscreen mode Exit fullscreen mode

Architecture

 ┌──────────────────────────────────────────┐
 │           Databricks Workspace           │
 │  ┌────────┐ ┌────────┐ ┌─────────────┐ │
 │  │Clusters│ │  Jobs  │ │Unity Catalog│ │
 │  └───┬────┘ └───┬────┘ └──────┬──────┘ │
 │      │          │             │         │
 │  ┌───┴──────────┴─────────────┴───┐     │
 │  │     Databricks REST APIs       │     │
 │  └───────────────┬────────────────┘     │
 └──────────────────┼──────────────────────┘
                    │
 ┌──────────────────┼──────────────────────┐
 │     Workspace Toolkit (this product)    │
 │  ┌──────────┐ ┌─────────┐ ┌─────────┐ │
 │  │ Cluster  │ │   Job   │ │  Unity  │ │
 │  │ Manager  │ │ Manager │ │ Catalog │ │
 │  └──────────┘ └─────────┘ └─────────┘ │
 │  ┌──────────┐ ┌─────────┐ ┌─────────┐ │
 │  │Workspace │ │ Secret  │ │ Perms   │ │
 │  │ Manager  │ │ Manager │ │ Manager │ │
 │  └──────────┘ └─────────┘ └─────────┘ │
 └─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Requirements

  • Python 3.10+
  • requests library (pip install requests)
  • pyyaml library (pip install pyyaml)
  • Databricks workspace with admin or workspace-level access
  • Personal access token or service principal credentials

Related Products


This is 1 of 11 resources in the Data Pipeline Pro toolkit. Get the complete [Databricks Workspace Toolkit] with all files, templates, and documentation for $59.

Get the Full Kit →

Or grab the entire Data Pipeline Pro bundle (11 products) for $169 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)