Amara Graham

Posted on Mar 17 • Edited on Mar 19

What's in a flow?

#automation #beginners #devops #tutorial

You can orchestrate anything! And therein lies the problem, can you even imagine what your flow could look like? Where do you start? And what components do you include?

We have a lot of amazing video content on Kestra where you've probably heard us mention how flows look and what they contain.

When you are just getting started, it might be hard to really see what we mean when we say something like "everything has an id and a type", particularly when you are looking at the following Hello World boilerplate:

id: buffalo_875352
namespace: company.team

tasks:
  - id: hello
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

Like sure, the one runnable task you can see does, indeed, have an id and a type. But is this really important to cover? And why? It's just not really convincing. At least not yet!

So let's talk through some increasingly complex flows, what components they contain, and get you more familiar with what your next workflow with Kestra could look like.

Expression syntax `{{ }}`

Let's increase the complexity from boiler plate and check out my log_off_flow and talk about expressions.

Expression syntax is critical to passing around information, making flows more dynamic with inputs, and even referencing a particular iteration during something like a ForEach.

KV Store values are pretty straight forward - {{ kv('issues_today') }}. This simply references the value with the key issues_today. It's similar for secrets.

Outputs might be a little more complicated, depending on the type of output - {{ outputs.get_issues.body }}. This would get the output of the get_issues task, and specifically the response body of the API request.

You can also get into some complex filtering and formatting with your expressions - {{ now() | dateAdd(-5, 'DAYS') | date(\"yyyy-MM-dd'T'07:00:00-06:00\") }}. This is how you would get todays date, but actually 5 days ago and format it in a certain way.

Everything really does have an `id` and a `type`

Once you hit a certain level of complexity in your workflows, you start nesting tasks. This could be due to flowable tasks where you want to do some sort of branching like parallelism in this snippet from the airbyte-sync-parallel-with-dbt blueprint:

tasks:
  - id: data_ingestion
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: salesforce
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ab

      - id: google_analytics
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12cd

      - id: facebook_ads
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ef

Whether it's a flowable or runnable task, it's got (say it with me now): an id and a type.

Plugin Defaults really do clean up your flow readability

Let's look at that same snippet from the airbyte-sync-parallel-with-dbt blueprint:

tasks:
  - id: data_ingestion
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: salesforce
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ab

      - id: google_analytics
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12cd

      - id: facebook_ads
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ef

You may not realize it, but this set of 3 tasks is using Plugin Defaults to cover some additional properties, which you can find further down the flow:

pluginDefaults:
  - type: io.kestra.plugin.airbyte.connections.Sync
    values:
      url: http://host.docker.internal:8000/
      username: "{{ secret('AIRBYTE_USERNAME') }}"
      password: "{{ secret('AIRBYTE_PASSWORD') }}"

Without them, that snippet would look more like this:

tasks:
  - id: data_ingestion
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: salesforce
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ab
        url: http://host.docker.internal:8000/
        username: "{{ secret('AIRBYTE_USERNAME') }}"
        password: "{{ secret('AIRBYTE_PASSWORD') }}"

      - id: google_analytics
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12cd
        url: http://host.docker.internal:8000/
        username: "{{ secret('AIRBYTE_USERNAME') }}"
        password: "{{ secret('AIRBYTE_PASSWORD') }}"

      - id: facebook_ads
        type: io.kestra.plugin.airbyte.connections.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ef
        url: http://host.docker.internal:8000/
        username: "{{ secret('AIRBYTE_USERNAME') }}"
        password: "{{ secret('AIRBYTE_PASSWORD') }}"

It's the exact same info x3. It's harder to read and if we needed to make an update to say, the URL, we'd be stuck doing it x3. Not great for maintainability, particularly those of us who typo from time-to-time.

And nothing is stopping you from using Plugin Defaults when you only use the plugin once. Maybe just pulling those values away from the rest of the plugin properties makes it easier to read and modify. Consider adding comments and group those properties to indicate those could be modified in the future, but don't touch the rest.

Order doesn't matter, mostly

I like my triggers at the top of my flow code, by default you may see them toward the bottom. Your topology will show them either at the top or on the left side, depending on how you chose to orient your diagram. Placing them at the top of your flow code makes it easier to read and quickly grasp how this flow could start, but if you have the topology panel open, you could quickly reference that just as easily.

However, your task order can matter, particularly if you want to pass outputs from one task to the next.

You may also want to group tasks in such a way that you contain them to a working directory. This is particularly important for tasks that clone repos and/or you want to share those files across scripts.

You can decide what order makes sense to you.

Meaningful, short `id`s make your topology a better reference

Again, when you are looking at a minimal flow, meaningful ids don't really click.

It's clear, you don't have to zoom in or move around, and you can see the log icon super clearly.

But then you do something more interesting and it looks more like:

And that's an eye chart. Let me zoom in a little and be less dramatic.

The longer the ids, the harder they are to really grasp what's happening in the topology. If you prefer to grasp your flow in the flow code directly, this may be less of a problem for you. But keep this in mind if you have project stakeholders who do rely on the topology and try to give them the best experience you can.

Errors

Errors can be handled at the flow-level or task-level with the errors property (which we sometimes refer to as the errors block). Things happen and we want them to fail gracefully or notify the right people or systems to ensure it's handled as gracefully as possible.

That would look something like this for a flow-level error:

errors:
  - id: rollout_failure_diagnostics
    type: io.kestra.plugin.core.log.Log
    message: |
      Rollout failed for tenant={{ inputs.tenant_id }}.
      execution={{ execution.id }}.
      last_error={{ errorLogs()[0]['message'] }}.
      Review the latest [PRE]/[PROBE]/[POST] logs for the failing application.

You'll probably want to reference what input, {{ inputs.x }}, may have contributed to the failure along with the execution id, {{ execution.id }}.

And if you didn't spot it, the errors property does, in fact, have an id and a type!

Flows as Blueprints

All the flows covered in this blog are available as blueprints, with the exception of the boilerplate that's included every time you create a new flow from scratch. You can find the blueprints below:

log_off_flow (available in two options: Notion or Confluence)
- With Notion
- With Confluence
airbyte-sync-parallel-with-dbt
argocd-single-tenant-wave-rollout

You can also find them in the Blueprints menu directly in the Kestra UI, where you can copy or click the "Use" button and use it yourself.

I hope you enjoyed this walkthrough of different flow components and flows with increasing complexity. I added tons of links so you can go a little deeper on certain topics as you see fit.

Let me know what you want to see next with Kestra in the comments or come join our community!