Muhammad Yahya Muhaimin for Hash Rekayasa Teknologi

Posted on Jul 26, 2022

Lessons Learned from Developing Serverless Workflow Runtime Implementation

#architecture #workflow #go #node

At Hash Rekayasa Teknologi, we've been developing and using MocoBaaS, a Backend-as-a-Service solution.
One of the features for implementing business logic is Custom Script.

This feature has served us well for many use cases.
However, there are some use cases that consist of multiple steps. They can be implemented by "chaining" multiple scripts, one script triggering another. While this can get the job done, it's hard to keep track of the steps that were executed.

Imagine we have a use case like Marketplace Order:

DISCLAIMER: It may not represent a real world use case.

Create Order
Confirm Payment
Confirm Delivery
Confirm Completed

It can be done by defining this flow:

Script: create-order
- Triggered By: HTTP source
- Triggers: create-order-success event
Script: confirm-payment
- Triggered By: Event source
- Triggers: confirm-payment-success event
Script: confirm-delivery
- Triggered By: Event source
- Triggers: confirm-delivery-success event
Script: confirm-completed
- Triggered By: Event source

With the flow above, the scripts were executed as is. There is no centralized mechanism of tracking the executed steps, whether they were executed properly or not.

Serverless Workflow to the rescue

Among workflow languages out there, we choose Serverless Workflow. It's a vendor-neutral, open-source and community-driven workflow ecosystem.
The workflow definition can be written in JSON or YAML format.
And then there are SDKs available in various programming languanges, like Java, Go, TypeScript, .NET, Python.

The Marketplace Order use case above can be defined like this:

id: marketplaceorder
version: "1.0"
specVersion: "0.7"
name: Marketplace Order Workflow
description: Create and process orders on the marketplace.
start: CreateOrder
functions:
  - name: createOrderFunction
    operation: mocobaas://marketplace-order#create-order
  - name: confirmPaymentFunction
    operation: mocobaas://marketplace-order#confirm-payment
  - name: confirmDeliveryFunction
    operation: mocobaas://marketplace-order#confirm-delivery
  - name: confirmCompletedFunction
    operation: mocobaas://marketplace-order#confirm-completed
states:
  - name: CreateOrder
    type: operation
    actions:
      - functionRef: createOrderFunction
    transition: ConfirmPayment
  - name: ConfirmPayment
    type: operation
    actions:
      - functionRef: confirmPaymentFunction
    transition: ConfirmDelivery
  - name: ConfirmDelivery
    type: operation
    actions:
      - functionRef: confirmDeliveryFunction
    transition: ConfirmCompleted
  - name: ConfirmCompleted
    type: operation
    actions:
      - functionRef: confirmCompletedFunction
    end: true

And this is the diagram visualization:

If you are new to Serverless Workflow, or workflow in general, you may have so many questions about that 😁

I recommend you to watch this presentation:

And then read the official Serverless Workflow examples and specification:

Version 0.7: examples, specification.
Version 0.8: examples, specification.

Let me continue the story...

What we need to build is a runtime implementation that execute workflows based on the definitions.
Golang has become an important part of our stack at Hash Rekayasa Teknologi. So we simply choose the Go SDK for Serverless Workflow. Although I didn't try other SDKs, I'm sure there shouldn't be much difference to what I'm using here.
The most important question with the SDK: What it does and doesn't?

It does:

Parse workflow JSON and YAML definitions.
One workflow definition has a hierarchical structure. Each definition from top level to sub-levels will be represented as a Model, such as Workflow, State, Action, Function, Retry.

It doesn't:

There is no Workflow Instance representation. For the execution, you have to define the unique identifier yourself.
The duration values in ISO 8601 duration format are not parsed.
The workflow expressions in jq format are not parsed.

With those limitations, there doesn't seem to be much we can do with the SDK. Just parse the workflow definition and use the hierarchical structure as a guide for executions.

package sw

import (
    "errors"
    "os"
    "path/filepath"

    "github.com/google/uuid"
    "github.com/serverlessworkflow/sdk-go/v2/model"
    "github.com/serverlessworkflow/sdk-go/v2/parser"
)

type StartWorkflowResult struct {
    InstanceID string `json:"instanceId"`
}

var workflows map[string]*model.Workflow

func LoadWorkflows() error {
    const definitionsDir = "definitions"

    dirEntries, err := os.ReadDir(definitionsDir)
    if err != nil {
        return err
    }

    workflows = make(map[string]*model.Workflow)

    for _, entry := range dirEntries {
        name := entry.Name()
        path := filepath.Join(definitionsDir, name)
        wf, err := parser.FromFile(path)
        if err != nil {
            return err
        }

        workflows[name] = wf
    }

    return nil
}

func StartWorkflow(name string, input map[string]interface{}) (*StartWorkflowResult, error) {
    wf, ok := workflows[name]
    if !ok {
        return nil, errors.New("Workflow not found: " + name)
    }

    instanceID := uuid.NewString()

    // Start a new instance.
    // Parameters: instanceID, wf, input

    return &StartWorkflowResult{instanceID}, nil
}

Here we store the Workflow models in a map, so the LoadWorkflows() function only needs to be called once.
And then the StartWorkflow() function will be called in every execution.

Take notes for the implemented features

We may not implement all the features in the specification. One thing we can do is document them. Each feature will have status:

implemented according to the spec 🟢🟢
implemented, but not according to the spec or using own standard 🟢🔴
not/not yet implemented 🔴

I took notes on a spreadsheet. You can see it here.
I use my native language, Bahasa Indonesia.
And it's not complete. I take note of a definition only when I start implementing it.

Let's see one example, the Function Definition:

As we know, service call is defined here.
The workflow runtime is written in Go, while the scripts are written in JavaScript (Node.js).
MocoBaaS already has an internal RPC mechanism, so we want to use "custom" type.
In spec v0.8, there's "custom" type. But as of this writing, the Go SDK only supports spec v0.7.

As you can see, we tried to stick to the spec as far as we could. But sometimes we have to use our own standards.

Executing workflow

The Marketplace Order Workflow has a linear flow, from create order to confirm completed. This is the directory structure containing workflow definition and scripts:

.
└── marketplace-order
    ├── definition.sw.yaml
    └── scripts
        ├── confirm-completed.js
        ├── confirm-delivery.js
        ├── confirm-payment.js
        └── create-order.js

The end result will be a JSON like this:

{
  "createOrder": true,
  "confirmPayment": true,
  "confirmDelivery": true,
  "confirmCompleted": true
}

When the workflow is executed, starting with create-order.js, data is a new object:

module.exports = async (ctx) => {
  return {
    data: { createOrder: true },
  };
};

Next, confirm-payment.js extends the data from previous state:

module.exports = async (ctx) => {
  return {
    data: { ...ctx.data, confirmPayment: true },
  };
};

And so on.

Tracking workflow execution

As written in the spec:
Depending on their workflow definition, workflow instances can be short-lived or can execute for days, weeks, or years.

There is no recommendation on how to store the tracking information. Any database can be used.
We need to handle these requirements:

One instance can have more than one states.
The state's data input is typically the previous state's data output.
If the state is the workflow starting state, its data input is the workflow data input.
When workflow execution ends, the data output of the last executed state becomes the workflow data output.

For example, we have two tables:

instances
instance_states

The Marketplace Order Workflow execution can be stored like this:

Retrying actions

If a state returning an error, we can leave it as the final result or define a retry policy.
For example, we have a Chance of Success Workflow.

Directory structure:

.
└── chance-of-success
    ├── definition.sw.yaml
    └── scripts
        └── chance.js

chance.js will randomize a boolean. If true, returns data. If false, returns error:

const chance = require("chance").Chance();

module.exports = async (ctx) => {
  const isTrue = chance.bool({ likelihood: ctx.data.likelihood });

  if (!isTrue) {
    return {
      error: { message: "failed" },
    };
  }

  return {
    data: { message: "success" },
  };
};

And the workflow definition contains a retry definition:

id: chanceofsuccess
version: "1.0"
specVersion: "0.7"
name: Chance of Success Workflow
description: Try your chance of success. Retry if failed.
start: TakeAChance
functions:
  - name: chanceFunction
    operation: mocobaas://chance-of-success#chance
retries:
  - name: chanceRetryStrategy
    delay: PT10S
    maxAttempts: 3
states:
  - name: TakeAChance
    type: operation
    actions:
      - functionRef: chanceFunction
        retryRef: chanceRetryStrategy
    end: true

With that retry definition, the runtime will perform this mechanism:

The maximum attempts is 3 times.
There is 10 second delay between retries.
If we get data before maxAttempts, there will be no more retries.
If maxAttempts is reached, there will be no more retries, regardless of the result.

Before we can use the delay duration, it needs to be parsed. For example, I use sosodev/duration and it works well.

Diagram visualization

Generating a diagram visualization from the workflow definition is really helpful, especially when you have complex workflows.
One way is that you can use the Web Editor in the official website. It can generate diagram from JSON or YAML, but the linter in the text editor will always expect JSON.

For VS Code users, there's an official extension, But as of this writing, it is outdated, only supports spec v0.6.
A better alternative is to use an extension from Red Hat. It supports spec v0.8. It also works well with spec v0.7. The only requirement is you must name the definition files to *.sw.json, *.sw.yaml or *.sw.yml.

Caveat:
Looks like those tools use the same generator, as they produce the same diagram visualization. I noticed that they can only visualize the flow, but don't include other details, such as functions or retries.

Closing Thoughts

Workflow is quite a huge feature. And as you can see, Serverless Workflow offers a great flexibility between standard and customization. But if you need more training wheels in using a workflow system, there may be better solutions out there.

We haven't implemented most of the Serverless Workflow features yet.
For example, the workflow expressions I mentioned above. Using a library like itchyny/gojq looks promising, although I haven't tried it.
But at least this little effort is enough for a minimal functioning system.

Well, hope you enjoyed this article and found it useful 😉

DEV Community

Lessons Learned from Developing Serverless Workflow Runtime Implementation

Serverless Workflow to the rescue

Take notes for the implemented features

Executing workflow

Tracking workflow execution

Retrying actions

Diagram visualization

Closing Thoughts

Top comments (0)

Read next

Write tools for LLMs with go - mcp-golang

Override Go app configuration with Environment variable

A pragmatic approach to SOLID

From Monolithic to Microservices: Architectures 101