Solving the Diamond Problem with a Spacelift Trigger policy

#sre #infrastucture #terraform #devops

The "diamond problem" (sometimes referred to as the "Deadly Diamond of Death") is an ambiguity that arises when two classes B and C inherit from A, and class D inherits from both B and C.

It is called the "diamond problem" because of the shape of the class inheritance diagram in this situation.

Before we solve this problem, let's walk through some building blocks so we can create complex workflows using trigger policies.

Purpose

Frequently, your infrastructure consists of a number of projects (stacks in Spacelift parlance) that are connected in some way - either depend logically on one another, or must be deployed in a particular order for some other reason - for example, a rolling deploy in multiple regions.

Enter trigger policies. Trigger policies are evaluated at the end of each stack-blocking run (which includes tracked runs and tasks) and allow you to decide if some tracked Runs should be triggered. This is a very powerful feature, effectively turning Spacelift into a Turing machine.

All runs triggered - directly or indirectly - by trigger policies as a result of the same initial run are grouped into a so-called workflow. In the trigger policy you can access all other runs in the same workflow as the currently finished run, regardless of their Stack. This lets you coordinate executions of multiple Stacks and build workflows which require multiple runs to finish in order to commence to the next stage (and trigger another Stack).

Data input

The schema of the data input that each policy request will receive can be found here.

Use Cases

Since trigger policies turn Spacelift into a Turing machine, you could probably use them to implement Conway's Game of Life, but there are a few more obvious use cases. Let's have a look at two of them - interdependent Stacks and automated retries.

Interdependent stacks

The purpose here is to create a complex workflow that spans multiple Stacks. We will want to trigger a predefined list of Stacks when a Run finishes successfully. Here's our first take:

package spacelift

trigger["stack-one"]   { finished }
trigger["stack-two"]   { finished }
trigger["stack-three"] { finished }

finished {
  input.run.state == "FINISHED"
  input.run.type == "TRACKED"
}

Here's a minimal example of this rule. But it's far from ideal. We can't be guaranteed that stacks with these IDs still exist in this account. Spacelift will handle that just fine, but you'll likely find if confusing. Also, for any new Stack that appears you will need to explicitly add it to the list. That's annoying.

We can do better, and to do that, we'll use Stack labels. Labels are completely arbitrary strings that you can attach to individual Stacks, and we can use them to do something magical - have "client" Stacks "subscribe" to "parent" ones.

So how's that:

package spacelift

trigger[stack.id] {
  stack := input.stacks[_]
  input.run.state == "FINISHED"
  input.run.type == "TRACKED"
  stack.labels[_] == concat("", ["depends-on:", input.stack.id])
}

The benefit of this policy is that you can attach it to all your stacks, and it will just work for your entire organization.

Can we do better? Sure, we can even have stacks use labels to decide which types of runs or state changes they care about. Here's a mind-bending example:

package spacelift

trigger[stack.id] {
  stack := input.stacks[_]
  input.run.type == "TRACKED"
  stack.labels[_] == concat("", [
    "depends-on:", input.stack.id,
    "|state:", input.run.state],
  )
}

Now, how cool is that?

Automated retries

Here's another use case - sometimes Terraform or Pulumi deployments fail for a reason that has nothing to do with the code - think eventual consistency between various cloud subsystems, transient API errors etc. It would be great if you could restart the failed run. Oh, and let's make sure new runs are not created in a crazy loop - since policy-triggered runs trigger another policy evaluation:

package spacelift

trigger[stack.id] {
  stack := input.stack
  input.run.state == "FAILED"
  input.run.type == "TRACKED"
  is_null(input.run.triggered_by)
}

Note that this will also prevent user-triggered runs from being retried. Which is usually what you want in the first place, because a triggering human is probably already babysitting the Stack anyway.

Diamond Problem

The diamond problem happens when you're stacks and their dependencies form a shape like in the following diagram:

          ┌────┐
       ┌─►│ 2a ├─┐
       │  └────┘ │
┌───┐  │         │   ┌───┐
│ 1 ├──┤         ├──►│ 3 │
└───┘  │         │   └───┘
       │  ┌────┐ │
       └─►│ 2b ├─┘
          └────┘

Which means that Stack 1 trigger both Stack 2a and 2b, and we only want to trigger Stack 3 when both predecessors finish. This can be elegantly solved using workflows.

First we'll have to create a trigger policy for Stack 1:

package spacelift

trigger["stack-2a"] {
  tracked_and_finished
}

trigger["stack-2b"] {
  tracked_and_finished
}

tracked_and_finished {
  input.run.state == "FINISHED"
  input.run.type == "TRACKED"
}

This will trigger both Stack 2a and 2b whenever a run finishes on Stack 1.

Now onto a trigger policy for Stack 2a and 2b:

package spacelift

trigger["stack-3"] {
  run_a := input.workflow[_]
  run_b := input.workflow[_]
  run_a.stack_id == "stack-2a"
  run_b.stack_id == "stack-2b"
  run_a.state == "FINISHED"
  run_b.state == "FINISHED"
}

Here we trigger Stack 3, whenever the runs in Stack 2a and 2b are both finished.

You can also easily extend this to work with a label-based approach, so that you could define Stack 3's dependencies by attaching a depends-on:stack-2a,stack-2blabel to it:

package spacelift

# Helper with stack_id's of workflow runs which have already finished.
already_finished[run.stack_id] {
  run := input.workflow[_]
  run.state == "FINISHED"
}

trigger[stack.id] {
  input.run.state == "FINISHED"
  input.run.type == "TRACKED"

  # For each Stack which has a depends-on label,
  # get a list of its dependencies.
  stack := input.stacks[_]
  label := stack.labels[_]
  startswith(label, "depends-on:")
  dependencies := split(trim_prefix(label, "depends-on:"), ",")

  # The current Stack is one of the dependencies.
  input.stack.id == dependencies[_]

  finished_dependencies := [dependency | 
                                       dependency := dependencies[_]
                                       already_finished[dependency]]

  # Check if all dependencies have finished.
  count(finished_dependencies) == count(dependencies)
}

Are you ready to create complex workflows using trigger policies?

Start your 30 day free trial of Enterprise today or book a demo if you need to dive in deeper with our engineers.

Original Source: https://docs.spacelift.io/concepts/policy/trigger-policy