Background
Our data warehouse team was scaling up deployment volumes. Based on prior experience (approximately 100 tables in 30 minutes), we knew larger deployments would take considerably longer.
When concerns about execution time arose, I noticed Azure DevOps offered multi-agent deployment capabilities. The idea seemed straightforward: distribute the workload across multiple agents to speed things up.
Before implementing, I investigated what would actually happen at the database level.
The Deployment Pattern
For each table deployment, we execute DDL and DML operations across a typical multi-layered data warehouse architecture:
- Data ingestion and transformation layers — Create tables for staging, integration, and historical tracking
- Presentation layer — Create views for data consumption and abstraction
- ETL orchestration metadata — Update control tables that manage job dependencies and execution tracking
When scaling from ~100 tables to several hundred, this translates to:
- Thousands of DDL statements (
CREATE TABLE/CREATE VIEW) - Hundreds of DML operations on shared control tables
Azure DevOps Multi-Agent: Understanding the Mechanism
Azure DevOps supports parallel job execution through two primary interfaces, both achieving the same underlying behavior:
Classic Editor (Release Pipelines)
In Azure DevOps Classic Editor, you can configure parallelism via:
Agent Job > Execution Plan > Parallelism > Multi-agent
This setting duplicates the entire Agent Job across multiple agents.
YAML Pipelines
The equivalent in YAML pipelines uses the parallel strategy:
jobs:
- job: DeployTables
strategy:
parallel: 10 # This DUPLICATES the job 10 times
pool:
vmImage: 'ubuntu-latest'
steps:
- script: |
# WARNING: Without work distribution logic,
# ALL 10 agents will execute THE SAME steps!
psql -h redshift-cluster -f deploy_all_tables.sql
Both approaches have the same fundamental behavior: They create multiple identical jobs where each agent executes the same tasks unless you explicitly write logic to distribute the work.
How to Actually Distribute Work
In Classic Editor (Release Pipelines)
You need to use predefined variables or custom logic in your tasks to determine which portion of work each agent should handle.
In YAML Pipelines
Azure DevOps provides system variables for manual work distribution:
jobs:
- job: DeployTables
strategy:
parallel: 10
steps:
- bash: |
# Use system variables to determine which tables THIS agent should handle
POSITION=$(System.JobPositionInPhase) # Values: 1, 2, 3, ..., 10
TOTAL=$(System.TotalJobsInPhase) # Value: 10
# Calculate this agent's workload (50 tables each)
START=$((($POSITION - 1) * 50 + 1))
END=$(($POSITION * 50))
echo "Agent $POSITION deploying tables $START to $END"
for i in $(seq $START $END); do
psql -f "table_${i}.sql"
done
Even with correct work distribution, DDL operations on Redshift face a fundamental limitation that makes parallelization ineffective.
The Critical Understanding
What developers often assume:
"Azure DevOps will automatically split my tables across 10 agents."
What actually happens (in both Classic Editor and YAML):
Azure DevOps creates 10 identical jobs. Each job runs the exact same steps unless you explicitly write logic to distribute the work.
Redshift Architecture: The Leader Node Constraint
This is where my investigation became crucial. To understand whether multi-agent deployment would help, I needed to understand how Redshift actually handles DDL operations.
How Redshift Handles DDL
Redshift uses a Leader Node + Compute Nodes architecture:
┌─────────────────────────────┐
│ Leader Node │ ← All DDL executes here
│ - System Catalog Tables │
│ - Query Planning │
│ - Metadata Management │
└─────────────────────────────┘
│
┌────┴────┬────────┬────────┐
▼ ▼ ▼ ▼
[Compute] [Compute] [Compute] [Compute] ← Only for data processing
Key finding: All DDL operations execute on the Leader Node only, regardless of how many compute nodes you have.
System Catalog Table Locking
When you execute DDL statements, Redshift must update its system catalog tables. While Redshift wraps PostgreSQL system catalogs with its own views (like PG_CLASS_INFO, PG_TABLE_DEF), the underlying PostgreSQL catalog tables are where the actual locking occurs:
Core PostgreSQL Catalog Tables (where locks occur):
-
pg_class— stores table/view metadata -
pg_attribute— stores column definitions -
pg_namespace— stores schema information -
pg_depend— stores object dependencies -
pg_type— stores data type definitions
Redshift Information Views (built on top):
-
PG_CLASS_INFO,PG_ATTRIBUTE_INFO— Redshift wrappers -
PG_TABLE_DEF— Redshift-specific comprehensive view -
SVV_TABLE_INFO— System view with distribution information
Critical finding: DDL operations require ACCESS EXCLUSIVE LOCKS on the underlying pg_class and pg_attribute tables, which are global singleton resources on the Leader Node. This forces all DDL statements to execute serially, regardless of:
- How many compute nodes you have (4, 10, or 100)
- How many Azure DevOps agents you use
- How well you distribute the work
-- What would happen with 10 parallel agents executing CREATE TABLE:
Agent 1/Session 1: CREATE TABLE table_001 (...); -- Executes (holds pg_class lock)
Agent 2/Session 2: CREATE TABLE table_021 (...); -- Waiting for pg_class lock ⏳
Agent 3/Session 3: CREATE TABLE table_041 (...); -- Waiting for pg_class lock ⏳
Agent 4/Session 4: CREATE TABLE table_061 (...); -- Waiting for pg_class lock ⏳
... (Agents 5-10 all waiting) ...
-- Result: Serialized execution despite parallel agents
Why This Happens:
Redshift's architecture inherits PostgreSQL's catalog design, where DDL operations must maintain ACID properties across system metadata. The Leader Node must ensure:
- Consistent metadata across all catalog tables
- No concurrent modifications to object definitions
- Proper dependency tracking for views and constraints
This design choice prioritizes data integrity over DDL parallelism.
Control Table Locking
Additionally, our deployment pattern includes updates to shared control tables:
DELETE FROM metadata_schema.etl_control WHERE table_name = 'example_table';
INSERT INTO metadata_schema.etl_control VALUES (...);
These operations acquire table-level exclusive locks, forcing all agents to queue for access to the same tables.
Performance Analysis: What Would Actually Happen
Based on our baseline data (~100 tables in 30 minutes) and understanding of Redshift's architecture, I projected what would happen with different approaches:
Scenario 1: Multi-Agent Without Work Distribution
strategy:
parallel: 10
steps:
- script: execute_all_tables.sql # No work distribution!
Projected result:
- Each of 10 agents executes all tables
- Total DDL operations: N_tables × 10 agents
- Database receives 10x the operations
- Outcome: Severe database overload, significantly slower than baseline
Scenario 2: Multi-Agent With Correct Work Distribution
strategy:
parallel: 10
steps:
- bash: |
# Correctly distribute: each agent handles subset of tables
TABLES_PER_AGENT=$(( TOTAL_TABLES / 10 ))
# ... execute only assigned tables
Projected result based on architecture analysis:
Expected (naive calculation):
Sequential time: ~300 minutes (extrapolated from baseline)
With 10 agents: 300 ÷ 10 = 30 minutes ✓
Actual (with Redshift catalog locks):
Agent 1: [████████████████████] ~300 min (executing)
Agent 2: [⏳⏳⏳⏳⏳⏳⏳⏳] wait → execute
Agent 3: [⏳⏳⏳⏳⏳⏳⏳⏳] wait → execute
...
Total time: ~300 minutes (serialized) + orchestration overhead
Why the overhead?
- Connection Overhead: 10 simultaneous connections to Redshift (vs. 1)
- Lock Contention Monitoring: Redshift Leader Node processing lock queues
- Transaction Retry Logic: Agents detecting timeouts and retrying
- Network Latency: Multiple agents competing for responses
- Logging Overhead: 10x more connection logs and audit entries
Scenario 3: Single-Agent Sequential (Current Approach)
Projected result:
- Clean serial execution
- Predictable linear scaling from baseline
- No orchestration overhead
- Expected time: Scales linearly from baseline (~3x for 300 tables)
Performance Comparison Matrix
| Operation Type | Lock Scope | Without Distribution | With Distribution | Actual Benefit |
|---|---|---|---|---|
CREATE TABLE |
pg_class (global) |
❌ N×agents duplication | ❌ Serialized | 0% |
CREATE VIEW |
pg_class, pg_depend
|
❌ N×agents duplication | ❌ Serialized | 0% |
| Control table DML | Table-level exclusive | ❌ N×agents duplication | ❌ Serialized (same table) | 0% |
SELECT queries |
Row-level/Data blocks | ⚠️ N×agents queries | ✅ True parallel | 80-90% |
Decision: Stay with Single-Agent
Based on this analysis, the decision was clear: do not implement multi-agent deployment.
Rationale:
- No performance benefit: DDL operations would serialize at the database level regardless of CI/CD parallelism
- Added complexity: Multi-agent requires work distribution logic, monitoring, and coordination
- Potential risks: Connection overhead and lock contention could actually slow things down
- Simpler is better: Single-agent deployment is more predictable and easier to debug
Alternative optimizations considered:
- Batch Transactions
BEGIN;
CREATE TABLE table_001 (...);
CREATE VIEW view_001 (...);
-- More statements...
COMMIT;
-
Split into Multiple Smaller Deployments
- Deploy in batches across multiple maintenance windows
- Reduces single-deployment duration
- More manageable rollback if issues occur
-
Optimize Individual DDL Statements
- Remove unnecessary
IF NOT EXISTSchecks - Defer
COMMENTstatements to post-deployment - Minimize column count where possible
- Remove unnecessary
Lessons Learned
1. Architecture Understanding Matters More Than Tooling
More CI/CD resources don't help if the database architecture doesn't support parallelism. Understanding Redshift's Leader Node serialization was more valuable than adding more Azure DevOps agents.
2. Challenge "Obvious" Solutions
"More parallelism = faster" is true for data processing (SELECT queries on compute nodes), but not for DDL operations on shared catalog tables. The "obvious" solution would have wasted implementation effort with no benefit.
3. Analyze Before Implementing
By investigating the architecture before implementation, we avoided:
- Wasted engineering time building work distribution logic
- Added operational complexity with no performance gain
- Potential performance degradation from connection overhead
This preventive analysis saved resources and kept our deployment process simple and predictable.
4. Simple Can Be Better
Single-agent deployment with optimized batching beat complex multi-agent orchestration. Sometimes the best optimization is recognizing when complexity doesn't add value.
Key Takeaways
For Redshift DDL Deployments:
✅ Single-agent sequential deployment is optimal for DDL-heavy workloads
✅ Batch transactions reduce round-trip overhead
✅ Multiple smaller deployments manage risk better than one large deployment
✅ Optimize individual DDL statements for the most impact
❌ Multi-agent parallelism provides no benefit (serialized at Leader Node)
❌ More compute nodes don't speed up DDL (only data queries benefit)
❌ Complex orchestration adds overhead without performance gain
Universal Lesson:
Understanding system constraints prevents premature optimization.
The best solution isn't always the most sophisticated—it's the one that works with your architecture, not against it.
References
Official Documentation
- AWS Redshift Architecture - System Architecture
- AWS Redshift - System Catalog Tables
- Azure DevOps - Specify Jobs in Your Pipeline
- Azure DevOps - Parallel Jobs Licensing
- PostgreSQL Documentation - System Catalogs
Key Documentation Insights
- Redshift DDL Processing: All DDL operations execute on the Leader Node, not compute nodes
- Azure DevOps Multi-Agent: Both Classic Editor and YAML approaches duplicate jobs; require manual work distribution
- System Catalog Locking: PostgreSQL (and Redshift) use ACCESS EXCLUSIVE locks on metadata tables during DDL operations
Top comments (0)