Databricks Workspace Toolkit
Automate Databricks workspace management — clusters, jobs, secrets, Unity Catalog, and permissions.
Stop clicking through the UI. Manage your entire Databricks workspace programmatically with production-ready Python wrappers around the Databricks REST APIs.
What You Get
- Workspace management — List, create, delete, import/export notebooks programmatically
- Cluster automation — Create clusters, resize, manage pools, enforce auto-termination policies
- Job orchestration — Create multi-task workflows, manage schedules, configure notifications
- Secret management — Create scopes, store secrets, manage ACLs for secure credential handling
- Unity Catalog setup — Bootstrap catalogs, schemas, tables, grants, and external locations
- Permissions manager — Configure RBAC for clusters, jobs, notebooks, and SQL warehouses
- Cluster policies — Cost control templates with instance type restrictions and spot pricing
- Job templates — Ready-to-use ETL and ML training job definitions
- Shell scripts — Bootstrap workspace setup and backup/export notebooks
- Admin dashboard — Databricks notebook showing cluster usage, job status, and costs
File Tree
databricks-workspace-toolkit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── workspace_manager.py # Notebook CRUD, import/export
│ ├── cluster_manager.py # Cluster lifecycle, pools, policies
│ ├── job_manager.py # Jobs API: create, run, notifications
│ ├── secret_manager.py # Secret scopes and ACLs
│ ├── unity_catalog_setup.py # UC bootstrap: catalogs, schemas, grants
│ └── permissions_manager.py # RBAC for workspace resources
├── configs/
│ ├── cluster_policies.json # Cost control cluster policies
│ ├── workspace_config.yaml # Environment configuration
│ └── job_templates/
│ ├── etl_job.json # Multi-task ETL pipeline job
│ └── ml_training_job.json # ML training with GPU cluster
├── scripts/
│ ├── setup_workspace.sh # Bootstrap workspace setup
│ └── export_workspace.sh # Backup notebooks and configs
├── notebooks/
│ └── admin_dashboard.py # Admin overview dashboard
└── guides/
└── workspace-management.md # Best practices guide
Getting Started
1. Configure Your Workspace
# configs/workspace_config.yaml
workspace:
host: "https://adb-1234567890.12.azuredatabricks.net"
token_env_var: "DATABRICKS_TOKEN"
2. Create a Cluster
from src.cluster_manager import ClusterManager
mgr = ClusterManager.from_config("configs/workspace_config.yaml")
cluster_id = mgr.create_cluster(
name="etl-cluster-prod",
node_type_id="Standard_DS3_v2",
num_workers=4,
auto_terminate_minutes=30,
spark_conf={"spark.sql.shuffle.partitions": "200"},
)
3. Create a Multi-Task Job
from src.job_manager import JobManager
mgr = JobManager.from_config("configs/workspace_config.yaml")
job_id = mgr.create_job_from_template("configs/job_templates/etl_job.json")
mgr.run_now(job_id)
Architecture
┌──────────────────────────────────────────┐
│ Databricks Workspace │
│ ┌────────┐ ┌────────┐ ┌─────────────┐ │
│ │Clusters│ │ Jobs │ │Unity Catalog│ │
│ └───┬────┘ └───┬────┘ └──────┬──────┘ │
│ │ │ │ │
│ ┌───┴──────────┴─────────────┴───┐ │
│ │ Databricks REST APIs │ │
│ └───────────────┬────────────────┘ │
└──────────────────┼──────────────────────┘
│
┌──────────────────┼──────────────────────┐
│ Workspace Toolkit (this product) │
│ ┌──────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Cluster │ │ Job │ │ Unity │ │
│ │ Manager │ │ Manager │ │ Catalog │ │
│ └──────────┘ └─────────┘ └─────────┘ │
│ ┌──────────┐ ┌─────────┐ ┌─────────┐ │
│ │Workspace │ │ Secret │ │ Perms │ │
│ │ Manager │ │ Manager │ │ Manager │ │
│ └──────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────┘
Requirements
- Python 3.10+
-
requestslibrary (pip install requests) -
pyyamllibrary (pip install pyyaml) - Databricks workspace with admin or workspace-level access
- Personal access token or service principal credentials
Related Products
- Spark ETL Framework — Production Spark ETL pipeline patterns
- Delta Lake Patterns — Delta Lake optimization patterns
- Data Quality Framework — Data quality checks and validation rules
This is 1 of 11 resources in the Data Pipeline Pro toolkit. Get the complete [Databricks Workspace Toolkit] with all files, templates, and documentation for $59.
Or grab the entire Data Pipeline Pro bundle (11 products) for $169 — save 30%.
Top comments (0)