DEV Community: livw

Automating Insurance Claims Processing with AI and Conductor

livw — Fri, 25 Apr 2025 10:12:22 +0000

One of the biggest bottlenecks for the insurance industry is the sheer number of claims to process. The manual approach is expensive and time-consuming, requiring human operations to verify the documents, cross-check against the insurance policy, and follow up with the claimant, before the claim can be approved.

These process inefficiencies translate to a poor claims experience for customers, leading to dissatisfaction and ultimately driving customers to switch insurers. What if there were a way to effectively automate all that work so that claims processing can be completed in mere days or hours, reducing customer churn?

In this guide, we will explore how to wield LLMs (large language models) and orchestration to streamline manual business processes like insurance claims. Read on to learn the benefits, technical implementation details, and a demo example that you can try out in Orkes Conductor.

The solution: AI meets orchestration

The insurance industry is not new to automation. Rules-based automation approaches like RPA involved tons of pre-planning and data preparation, which meant that once the process was set in place, new changes could not be easily made. These early automation approaches were more rigid and limited, and could not easily scale to handle complex cases or exceptions.

Automating complexity

By leveraging the natural language capabilities of LLMs, the claims processing pipeline can be more effortlessly automated without having to predefine hundreds of rules across different policy wordings. Using prompt engineering, an LLM can act as an insurance representative to evaluate whether a claim can be approved. Here is an LLM-generated evaluation, which correctly identifies potential issues without any explicit business rules set:

The claim description mentions leukemia, which is a type of cancer. The 
doctor's report indicates tumors detected in blood, which is consistent 
with leukemia. However, the stage of cancer is not explicitly mentioned, 
and the policy covers only stage 3 and 4 cancer. The required documents 
(medical bill and doctor's report) are present, but the stage information 
is missing, which creates some uncertainty. The probability [of approval] 
is relatively high but not certain due to the lack of specific staging 
information.

Automating distributed systems

Reviewing the claim details is just one step in the insurance processing pipeline. This is where orchestration comes into play. Orkes Conductor is a fully-managed orchestration platform that coordinates disparate components and systems into an automated workflow or business process. Even if the supporting documents are uploaded on a system separate from customer details, or if the payment processing service resides on a different platform from the claims portal, an orchestration engine transforms these friction points into a well-oiled flow.

These are some key highlights of Orkes Conductor:

Integration—Connect different services and data sources, including legacy systems, through a cloud-based middleware platform.
Visibility—Track the status of your processes, easily recover from failures, and gain in-depth metrics on performance.
AI capabilities—Use in-built features to natively add LLM-powered components to your processes.

Now, let’s walk through an example insurance claims flow built in Orkes Conductor.

Insurance claims processing flow

Orkes Conductor can automate any claims processing flow with stateful process orchestration and seamless AI integrations.

The claims processing flow can be distilled into three main steps:

Data extraction from multiple sources—capture the claim details, supporting documents, and relevant policy documents.
Claims assessment—review the data to determine if the claim meets the criteria for approval, using AI or human reviewers.
Outcome processing—execute the approval decision, like kicking off a payments flow and notifying the claimant.

As the orchestrator unit, Orkes Conductor acts as the execution engine that drives all the core components of the insurance claims process.

Conductor is the central orchestrator that coordinates between distributed services and systems.

1. Data extraction from multiple sources

With a suite of over 20 system tasks, Orkes Conductor supports retrieving data from many sources: through API endpoints, from data lakes, SQL databases, vector databases in a RAG system, and more. This enables you to automatically plug disparate data sources into business processes without any manual imports.

2. AI-powered claim assessment

The claim description mentions leukemia, which is a type of cancer. The 
doctor's report indicates tumors detected in blood, which is consistent 
with leukemia. However, the stage of cancer is not explicitly mentioned, 
and the policy covers only stage 3 and 4 cancer. The required documents 
(medical bill and doctor's report) are present, but the stage information 
is missing, which creates some uncertainty. The probability [of approval] 
is relatively high but not certain due to the lack of specific staging 
information.

As previously demonstrated, LLMs possess the capability to naturally parse text without requiring deep technical work. Leveraging prompt engineering, the LLM can be easily tuned to output accurate claim assessments. Even if the insurance policy changes, an LLM-powered solution will intelligently reason through any claims scenario. Here is a simplified prompt template that produced the claims assessment above:

You are an insurance claims assessor who needs to calculate the probability of approving a claim. 

This is the policy statement: "${policy-statement}".

These are the supporting docs: "${docs}".

This is the claims description: "${claims-description}".

Orkes Conductor makes it easy to scale LLM-driven processes with its AI Prompt Studio and code-free AI/LLM integrations and tasks. With the Orkes’ AI Prompt Studio, you can securely craft and test reusable prompts for the claims assessment workflow. By replacing the prompt variables like ${policy-statement} with the required information extracted in step 1, the LLM can predictably handle complex or edge cases—combining the strengths of AI-powered flexibility with guardrails.

Orkes’ suite of LLM integrations provides access to the latest AI models from all major LLM providers, like OpenAI, Amazon Bedrock, and Anthropic. With all these choices at your fingertips, Orkes’ modular workflow approach makes it easy to test and switch between any AI model to find the most effective one for your use case — in this case, an insurance claims flow.

Bring all of these together by using Orkes’ built-in LLM tasks, which provides chat completions, embedding retrievals, and more without needing to add extra code.

Orkes' suite of AI features empowers rapid development.

Human-in-the-loop

Even with guardrails for AI models, adding human review checkpoints further mitigates the risks of AI miscalculation. After the AI-driven claims assessment is completed, the process can then trigger a human review for final approval of the insurance payout. The AI-generated assessment provides a detailed summary for the human assessor to investigate, speeding up the process.

In Orkes, our human-in-the-loop features allow you to seamlessly integrate these human touchpoints with both your backend process and your frontend claims review portal. This ensures an audit trail of all actions taken, while guaranteeing that businesses don’t have to migrate to different portal just to enjoy these benefits.

3. Business decision post-actions

Finally, once the decision has been made, the workflow can route either to an early termination in the case of rejection, or to a post-processing sequence like a payout initiation and outcome notification. Orkes’ orchestration engine facilitates coordination across different microservices, internal services, and other third-party integrations like Stripe and Sendgrid.

Integration with your tech stack

Integrate the workflow into your existing portal for claims processing. Run it as a server job. Trigger the claims process from an event-driven system. Whatever your system architecture requires, Orkes Conductor brings the flexibility to fit with your existing processes rather than force you to migrate.

Try it out in Conductor

Now, here’s how you can try out a working AI-powered insurance claims processing workflow for yourself in Conductor.

Orkes’ visual workflow editor makes it intuitive to transition from a napkin-top or whiteboard-sketch idea to a production-ready workflow.

Prerequisites:

Create an account in our free Developer Playground.

Step 1: Create the insurance claim workflow

Log in to Developer Playground.
In Definitions > Workflow, select + Define workflow.

Paste the following workflow JSON into the Code tab on the right-side panel:

{
  "createTime": 1744263984129,
  "updateTime": 1744871696379,
  "name": "insurance-claim",
  "description": "Insurance claim workflow",
  "version": 1,
  "tasks": [
    {
      "name": "collate-details",
      "taskReferenceName": "collate-details_ref",
      "inputParameters": {},
      "type": "FORK_JOIN",
      "decisionCases": {},
      "defaultCase": [],
      "forkTasks": [
        [
          {
            "name": "get-customer-details",
            "taskReferenceName": "get-customer-details_ref",
            "inputParameters": {
              "uri": "https://orkes-api-tester.orkesconductor.com/api",
              "method": "GET",
              "connectionTimeOut": 3000,
              "readTimeOut": "3000",
              "accept": "application/json",
              "contentType": "application/json"
            },
            "type": "HTTP"
          },
          {
            "name": "get-claims-details",
            "taskReferenceName": "get-claims-details_ref",
            "inputParameters": {
              "expression": "// Function to randomly pick an item from the array\n(function getRandomItem(arr) {\n  const items = [\n    { id: 1, name: \"Jane\", claims: \"stage 3 stomach cancer\" },\n    { id: 2, name: \"Jen\", claims: \"stage 0 breast cancer\" },\n    { id: 3, name: \"John\", claims: \"stage 4 leukemia\" },\n    { id: 4, name: \"Jim\", claims: \"hyperthyroidism \" },\n    { id: 4, name: \"Jeff\", claims: \"leukemia \" },\n    { id: 4, name: \"June\", claims: \"stage 4 flu \" }\n  ];\n\n  const randomIndex = Math.floor(Math.random() * items.length);\n  return items[randomIndex];\n}\n)();",
              "evaluatorType": "graaljs"
            },
            "type": "INLINE"
          }
        ],
        [
          {
            "name": "get-policy-details",
            "taskReferenceName": "get-policy-details_ref",
            "inputParameters": {
              "uri": "https://orkes-api-tester.orkesconductor.com/api",
              "method": "GET",
              "connectionTimeOut": 3000,
              "readTimeOut": "3000",
              "accept": "application/json",
              "contentType": "application/json"
            },
            "type": "HTTP"
          },
          {
            "name": "get-coverage-statement",
            "taskReferenceName": "get-coverage-statement_ref",
            "inputParameters": {
              "coverageStatement": "Critical illness cover of USD 200,000.00 for stage 3 and 4 cancer. To make a successful claim, the following documents are required: (1) medical bill and (2) doctor's report. If any documents are missing, the claim cannot be processed."
            },
            "type": "SET_VARIABLE"
          }
        ],
        [
          {
            "name": "get-supporting-docs",
            "taskReferenceName": "get-supporting-docs_ref",
            "inputParameters": {
              "expression": "// Function to randomly pick an item from the array\n(function getRandomItem(arr) {\n  const items = [\n    { receipt: \"x-ray scan - USD 4000 consult USD 7000 total USD 11000\", report: \"Doctor Lee 15/03/25 Tumours detected in blood. Suspected stage 4 cancer. Patient is referred for further treatment.\" },\n    { receipt: \"x-ray scan - USD 4000 consult USD 7000 total USD 11000\", report: \"Doctor Lee 15/03/25 Tumours detected in blood. Suspected stage 1 cancer. Patient is referred for further treatment.\" },\n    { receipt: \"x-ray scan - USD 4000 consult USD 7000 total USD 11000\", report: \"Doctor Lee 15/03/25 Tumours detected in blood. Patient is referred for follow-up.\" },\n    { receipt: \"\", report: \"\" }\n  ];\n\n  const randomIndex = Math.floor(Math.random() * items.length);\n  return items[randomIndex];\n}\n)();",
              "evaluatorType": "graaljs"
            },
            "type": "INLINE"
          }
        ]
      ],
      "startDelay": 0,
      "joinOn": []
    },
    {
      "name": "join_on_collate",
      "taskReferenceName": "join_on_collate_ref",
      "inputParameters": {},
      "type": "JOIN",
      "forkTasks": [],
      "joinOn": [
        "get-customer-details_ref",
        "get-policy-details_ref"
      ],
      "optional": false
    },
    {
      "name": "get-approval-probability",
      "taskReferenceName": "get-approval-probability_ref",
      "inputParameters": {
        "llmProvider": "AnthropicClaude",
        "model": "claude-3-5-sonnet-20240620",
        "promptName": "determine-insurance-claim-probability",
        "promptVariables": {
          "claims-description": "${get-claims-details_ref.output.result.claims}",
          "policy-statement": "${workflow.variables.coverageStatement}",
          "docs": "${get-supporting-docs_ref.output.result}"
        }
      },
      "type": "LLM_TEXT_COMPLETE"
    },
    {
      "name": "ai-decider",
      "taskReferenceName": "ai-decider_ref",
      "inputParameters": {
        "probabilityValue": "${get-approval-probability_ref.output.result.probability}"
      },
      "type": "SWITCH",
      "decisionCases": {
        "0.0": [
          {
            "name": "send-refile-claim-notification",
            "taskReferenceName": "send-refile-claim-notification_ref",
            "inputParameters": {
              "uri": "https://orkes-api-tester.orkesconductor.com/api",
              "method": "GET",
              "accept": "application/json",
              "contentType": "application/json",
              "encode": true
            },
            "type": "HTTP"
          },
          {
            "name": "terminate_2",
            "taskReferenceName": "terminate_ref_2",
            "inputParameters": {
              "terminationStatus": "COMPLETED",
              "terminationReason": "Claim did not pass LLM validation check. Need to refile for claims processing."
            },
            "type": "TERMINATE"
          }
        ]
      },
      "defaultCase": [
        {
          "name": "approve-claim-payout",
          "taskReferenceName": "approve-claim-payout_ref",
          "inputParameters": {
            "__humanTaskDefinition": {
              "assignmentCompletionStrategy": "LEAVE_OPEN",
              "displayName": "Approve Claim",
              "userFormTemplate": {
                "name": "InsuranceClaims",
                "version": 1
              },
              "assignments": [
                {
                  "assignee": {
                    "user": "ACME",
                    "userType": "EXTERNAL_GROUP"
                  },
                  "slaMinutes": 0
                }
              ],
              "taskTriggers": []
            },
            "claimDescription": "${get-claims-details_ref.output.result.claims}",
            "coverageStatement": "${workflow.variables.coverageStatement}",
            "probability": "${get-approval-probability_ref.output.result.probability}",
            "aiSummary": "${get-approval-probability_ref.output.result.reason}",
            "approval": false
          },
          "type": "HUMAN"
        },
        {
          "name": "approved",
          "taskReferenceName": "approved_ref",
          "inputParameters": {
            "approval": "${approve-claim-payout_ref.output.approval}"
          },
          "type": "SWITCH",
          "decisionCases": {
            "true": [
              {
                "name": "process-payment",
                "taskReferenceName": "process-payment_ref",
                "inputParameters": {
                  "uri": "https://orkes-api-tester.orkesconductor.com/api",
                  "method": "GET",
                  "connectionTimeOut": 3000,
                  "readTimeOut": "3000",
                  "accept": "application/json",
                  "contentType": "application/json"
                },
                "type": "HTTP"
              },
              {
                "name": "send-success-notification",
                "taskReferenceName": "send-success-notification_ref",
                "inputParameters": {
                  "uri": "https://orkes-api-tester.orkesconductor.com/api",
                  "method": "GET",
                  "connectionTimeOut": 3000,
                  "readTimeOut": "3000",
                  "accept": "application/json",
                  "contentType": "application/json"
                },
                "type": "HTTP"
              }
            ]
          },
          "defaultCase": [
            {
              "name": "send-not-approved-notification",
              "taskReferenceName": "send-not-approved-notification_ref",
              "inputParameters": {
                "uri": "https://orkes-api-tester.orkesconductor.com/api",
                "method": "GET",
                "connectionTimeOut": 3000,
                "readTimeOut": "3000",
                "accept": "application/json",
                "contentType": "application/json"
              },
              "type": "HTTP"
            },
            {
              "name": "terminate_1",
              "taskReferenceName": "terminate_ref_1",
              "inputParameters": {
                "terminationStatus": "COMPLETED",
                "terminationReason": "Claim has not been approved by human agent."
              },
              "type": "TERMINATE"
            }
          ],
          "evaluatorType": "value-param",
          "expression": "approval"
        }
      ],
      "evaluatorType": "value-param",
      "expression": "probabilityValue"
    }
  ],
  "inputParameters": [
    "policyNumber",
    "customerId",
    "supportingDocs"
  ],
  "outputParameters": {},
  "failureWorkflow": "",
  "schemaVersion": 2,
  "restartable": true,
  "timeoutPolicy": "ALERT_ONLY",
  "timeoutSeconds": 0,
  "variables": {},
  "inputTemplate": {},
  "enforceSchema": true
}

Save the workflow.

Step 2: Add LLM integrations and models

Go to Integrations and select + New integration.
In the AI/LLM section, add your desired LLM provider.

Note: In the workflow that you’ve just created, the get-approval-probability task makes use of **Anthropic Claude’s Sonnet 3.5, but you can choose any LLM provider and simply modify the workflow task later.
Return to Integrations and add your LLM models to your newly-added integration.

For detailed steps on adding LLM integrations, refer to the documentation.

Step 3: Add the AI prompt

Go to Definitions > AI Prompts and select + Add AI prompt.

In the Code tab, paste the following JSON:

{
  "createTime": 1744715996941,
  "updateTime": 1744906103023,
  "name": "determine-insurance-claim-probability",
  "template": "You are an insurance claims assessor who needs to calculate the probability of approving a claim.\n\nThis is the policy statement: \"${policy-statement}\".\n\nThese are the supporting docs: \"${docs}\".\n\nThis is the claims description: \"${claims-description}\".\n\nFormat your response ONLY as a JSON object with the following structure:\n<valid>\n{\n  \"probability\": \"A numerical probability between 0.0 - 1.0. \",\n  \"reason\": \"A reason for the probability.\"\n}\n</valid>\n\nDo not wrap the JSON object in markdown. Do use quotation marks in the JSON object.\n<invalid>\n{probability=0.0, reason=The claim description mentions 'stage 5 cancer', which does not match the policy coverage for stage 3 and 4 cancer. Additionally, the provided documents confirm stage 3 cancer, further disqualifying the claim from approval.}\n</invalid>\n\n<invalid>\n```

json\n{\n  \"probability\": 0.0,\n  \"reason\": \"The claim description mentions 'stage 5 cancer', which does not match the policy coverage for stage 3 and 4 cancer. Additionally, the provided documents confirm stage 3 cancer, further disqualifying the claim from approval.\"\n}\n

```\n</invalid>",
  "description": "Determine the probability of approving a claim based on a policy statement and a claim description",
  "variables": [
    "docs",
    "claims-description",
    "policy-statement"
  ],
  "integrations": [],
  "tags": []
}

Return to the Form tab, and add any AI models you wish to use with this prompt.

Note: In the workflow that you’ve just created, the get-approval-probability task makes use of **Anthropic Claude’s Sonnet 3.5, but you can add any LLM provider here and simply modify the workflow task later.
Save the AI prompt.

Optional: Modify the AI prompt

The prompt should look something like this:

You are an insurance claims assessor who needs to calculate the probability of approving a claim.

This is the policy statement: "${policy-statement}".

These are the supporting docs: "${docs}".

This is the claims description: "${claims-description}".

Format your response ONLY as a JSON object with the following structure:
<valid>
{
  "probability": "A numerical probability between 0.0 - 1.0. ",
  "reason": "A reason for the probability."
}
</valid>

Do not wrap the JSON object in markdown. Do use quotation marks in the JSON object.
<invalid>
{probability=0.0, reason=The claim description mentions 'stage 5 cancer', which does not match the policy coverage for stage 3 and 4 cancer. Additionally, the provided documents confirm stage 3 cancer, further disqualifying the claim from approval.}
</invalid>

<invalid>
\`\`\`json
{
  "probability": 0.0,
  "reason": "The claim description mentions 'stage 5 cancer', which does not match the policy coverage for stage 3 and 4 cancer. Additionally, the provided documents confirm stage 3 cancer, further disqualifying the claim from approval."
}
\`\`\`
</invalid>

To modify the prompt, here are some guidelines to steer the LLM to respond with a well-structured JSON output:

Unambiguously establish the role of the LLM as an insurance claims assessor.
Precisely describe the expected input and output formats.
- Make sure the output has the exact fields expected by the subsequent tasks in the workflow.
- Provide clear examples of the output.
Emphasize fairness and adherence to the provided policies.
Make sure that missing inputs are properly handled.

Optional: Modify the workflow

If you wish to use other LLM providers than Anthropic Claude, make sure to modify the get-approval-probability task.

Return to your insurance claim workflow and select the get-approval-probability task in the visual diagram.
In the Task tab, modify the LLM provider and Model fields.
Save the workflow.

Step 4: Add the user form for the Human task

The user form serves as the interface for human assessors to evaluate and approve the payout. Orkes Conductor makes it easy to integrate and deploy these user forms on your own user portals through our extensive suite of APIs and SDKs.

Go to Definitions > User Forms and select + New form.

In the Code tab, paste the following JSON:

{
  "createTime": 1744816361848,
  "updateTime": 1744816643214,
  "name": "InsuranceClaims",
  "version": 1,
  "jsonSchema": {
    "$schema": "http://json-schema.org/draft-07/schema",
    "properties": {
      "claimDescription": {
        "type": "string"
      },
      "coverageStatement": {
        "type": "string"
      },
      "probability": {
        "type": "string"
      },
      "aiSummary": {
        "type": "string"
      },
      "approval": {
        "type": "boolean"
      }
    }
  },
  "templateUI": {
    "type": "VerticalLayout",
    "elements": [
      {
        "type": "Control",
        "scope": "#/properties/claimDescription",
        "label": "Claim Description",
        "options": {
          "readonly": true
        }
      },
      {
        "type": "Control",
        "scope": "#/properties/coverageStatement",
        "label": "Coverage Statement",
        "options": {
          "readonly": true
        }
      },
      {
        "type": "Control",
        "scope": "#/properties/probability",
        "label": "Probability",
        "options": {
          "readonly": true
        }
      },
      {
        "type": "Control",
        "scope": "#/properties/aiSummary",
        "label": "AI Summary",
        "options": {
          "readonly": true
        }
      },
      {
        "type": "Control",
        "scope": "#/properties/approval",
        "label": "Approve?",
        "options": {
          "default": false
        }
      }
    ]
  }
}

Save the user form.

Running the workflow

Now that all your Conductor resources are ready, let’s run the workflow to experience the power of AI in automating claims processing.

To run the workflow:

In Definitions > Workflow, select your insurance claim workflow.
Select the Run tab in the right-side panel and select Run Workflow.

Upon running the workflow, you will be redirected to the workflow execution details page. Use the visual diagram to follow the workflow progression.

In Conductor, easily track workflows as they progress through different stages.

Intelligent automation in action

In the workflow execution details page, select the get-approval-probability task and then its Output tab. The LLM will provide an approval probability along with its reasoning.

In this instance, the probability of approval is 0, since the required supporting documents are missing. With workflow orchestration, the process then automatically progresses to notify the claimant to supplement the missing documents without needing additional human review.

Let’s trigger another execution to examine how a different scenario might play out.

In this instance, the probability is non-zero since all the supporting documents are present. However, the supporting evidence is a little vague and may require further follow-up. This is where the workflow benefits greatly from human oversight. Since there is a chance for approval, the workflow routes to a human-in-the-loop task, where a human assessor can review the AI-generated assessment summary to make the final decision.

Through Conductor's Human Task APIs, these user forms can be integrated into existing portals.

Try it out yourself by going to Executions > Human Tasks and selecting the pending execution. You can claim the task, check the approve button to approve it (or leave it unchecked to reject it), and complete the task.

When you return to the workflow executions details page (in Executions > Workflow), the workflow will have progressed to the next steps depending on your choice. This is the power of Orkes Conductor, combining both orchestration and AI capabilities to build automated processes that scale.

Extending the workflow

While the data extraction, payment processing, and notification steps are mock tasks, they can be easily replaced with real data sources and services. Go a step further by switching out the mock tasks with real tasks for your own needs, or request a demo for Orkes Conductor to learn more about how orchestration can accelerate your business processes.

Summary

Slow, outdated insurance claims processes can be modernized by investing in the latest technology for AI and orchestration. Using Orkes Conductor to scale AI-automated processes offers a multitude of benefits:

AI automation and agility—Easily add and upgrade AI-driven components to your processes for intelligent automation.
Integration with existing systems—Use Orkes as the middleware platform to integrate legacy systems, third-party applications, and other components in your business ecosystem.
Enhanced governance and monitoring—Built-in monitoring and state tracking dashboards to get global visibility and auditability into processes and performance.
Human-in-the-loop oversight—Seamlessly add human oversight to processes and integrate them with frontend interfaces for your intended users.
Fast, scalable performance—Build and run processes on a performant orchestration engine that powers the biggest players across industries, like Netflix, Tesla, and American Express.

Conductor is an enterprise-grade Unified Application Platform for process automation, API and microservices orchestration, agentic workflows, and more. Check out the full set of features, or try it yourself using our free Developer Playground.

Orchestrating Asynchronous Workflows (How Are They Different from Synchronous?)

livw — Thu, 27 Mar 2025 04:38:48 +0000

Asynchronous workflows have become a cornerstone of modern software development, enabling applications to handle complex operations without blocking the main execution thread. This approach is how we can design responsive, scalable, and efficient systems involving distributed components. In this blog, let’s explore what asynchronous workflows are, their use cases, challenges, and best practices for implementing them.

Why use asynchronous workflows?

The modern tech stack is inherently distributed. Your SuperAwesome program might call on a LongProcessing endpoint, receive SuperImportant events streamed from a legacy system, or require SomeRelevant input from a human user. When these processes or events take a long or indefinite time to complete, in a synchronous paradigm, your main execution program is forced to wait, which blocks it from completing anything else.

An asynchronous workflow approach uses design patterns where indefinite or long-running processes are non-blocking yet durable. In other words, asynchronous workflows enable responsive and resilient systems even in the face of uncertainty, like an unscheduled event message or unknown processing time.

One of the biggest benefits of asynchronous workflows is the ability to combine both event-driven and orchestration patterns into a unified system for event-driven workflows. When workflow orchestration is used in conjunction with event-driven architecture, this flexibility unlocks the best of both worlds: real-time responsivity to events while coordinating complex sequences of interactions.

But what exactly does asynchronous mean? What even is an asynchronous workflow? You may already be familiar with these concepts, but let’s have a quick recap.

Synchronous vs asynchronous programming

In software, synchronous and asynchronous are broad concepts that apply across all levels of software architecture. At its core, synchronous tasks or processes are executed in a fashion where the server actively awaits the response before moving on to the next task.

In contrast, asynchronous tasks or processes are executed without the server actively waiting for the required response. In an asynchronous paradigm, the task is executed in a way that doesn’t block the main thread, such as by being completed on a separate thread, worker, or program. The execution is typically a fire-and-forget pattern, where the main thread initiates the task without waiting for the response — a simple implementation could be a function that starts the operation and returns immediately to the main thread, with a callback to return the computed result later. The server can then process another task that isn’t dependent on this pending task.

Area	Synchronous Programming	Asynchronous Programming
Code execution	Tasks are executed one after the other only after completion (blocking).	Tasks are executed in a fire-and-forget fashion (can be non-blocking).
Implementation	Easier to implement.	More challenging to implement.
Program flow	Deterministic, easier to reason about and debug.	Non-deterministic, harder to reason about and debug; potential for race conditions.
Scalability	Limited scalability under high concurrency.	Higher scalability under high concurrency due to more efficient resource usage.
Uses	Ideal for straightforward projects without unpredictable interactions.	Ideal for complex projects involving real-time interactions with multiple distributed components.

Challenges of asynchronous implementation

At the program-level, asynchronous programming can be implemented in several ways: callbacks, promises, or async/await. When it comes to communicating between different services, it starts to look more like webhooks, event handlers, or stream processing. For modern-day applications, event-driven architecture has multiple advantages: real-time processing, loose coupling of services, and resilient messaging.

However, an event-driven approach runs into many complications when services inadvertently still need to be executed in a specific order, resulting in implicit dependencies that are difficult to trace and debug. Issues like data consistency, handling transaction rollbacks, and mitigating race conditions end up much more difficult to deal with in a distributed, decoupled event-based architecture. Without careful implementation, the state of your application becomes impossible to maintain or track.

Another related challenge is ensuring the durability of your program flow. If an asynchronous process is in progress but crashes mid-flight, the work may get lost without retries or retried with duplicate side effects. Getting visibility for troubleshooting and handling failures gracefully becomes even more complicated, taking away precious time from developing the main program flow.

Using asynchronous workflows

Enter asynchronous workflows.

A workflow is a sequence of tasks that must be completed in a specific order to achieve a particular goal. On the surface, an “asynchronous workflow” seems oxymoronic: an ordered sequence of tasks can’t possibly be asynchronous. But let’s return to the core of asynchronicity: completing tasks without blocking the program flow. In an asynchronous workflow, asynchronous tasks are triggered and returned immediately, with the actual response returned later on. This is what makes for an asynchronous workflow, even if the tasks are executed in sequential order.

Asynchronous workflows contain asynchronous tasks.

By orchestrating asynchronous workflows, we can easily overcome the issue of state management and durable execution when tracking various asynchronous tasks or processes.

State management with asynchronous workflows

Workflows can be orchestrated and managed using unified platforms like Orkes Conductor. Using workflow orchestration, a central orchestrator manages the execution of a series of tasks pre-determined or dynamically ordered at runtime. The platform tracks each task’s status, so you get a global view of the application state (scheduled, in progress, completed, failed), even when the tasks occur independently of the main flow.

State management ensures application consistency, facilitates graceful recovery, and enables rapid debugging in case of terminal failure. Using orchestration, developers no longer need to spend time building plumbing code to get visibility into the execution flow of distributed components.

When to use asynchronous workflows?

The choice to use a synchronous or asynchronous workflow depends on your project requirements. Asynchronous workflows are often advantageous in the following situations where there is uncertainty in time-bound conditions:

User-facing applications where responsiveness is critical

An asynchronous workflow enables the application to process long jobs, like file uploads, in the background while the user continues using the application.
Data processing and analytics pipelines

Since ingesting, transforming, and analyzing large datasets can take a long time, asynchronous orchestration helps manage the flow between various processing stages, ensuring data integrity and handling errors appropriately.
Long-running business processes that span hours or days

Asynchronous orchestration provides a way to manage the state of business processes, like order fulfillment, approval flows, and claims processing, over an indefinite period.

In these cases where processes take a long or unknown time to complete, using an asynchronous workflow ensures that the process remains durable and persists despite interruptions or failures.

Best practices for asynchronous workflows

When using asynchronous workflows for your applications, keep in mind the following practices:

Design for resilience.

Configure proper retry mechanisms and design with the expectation that services may fail temporarily. Use compensation flows to handle terminal failures or the saga pattern to safely track reversals or termination in a workflow.
Make sure to use the appropriate error-handling measures.

An asynchronous workflow that takes a long time to complete will require different timeout, rate limit, or retry strategies compared to fast, synchronous workflows. For example, to avoid unnecessary timeouts, use an appropriate timeout duration based on the recovery requirements.
Embed observability into your workflows.

Get comprehensive tracing, logging, and monitoring across your asynchronous workflows. This is crucial for diagnosing issues and maintaining overall system health.

Building an asynchronous workflow using Orkes Conductor

Orkes Conductor is an enterprise-grade Unified Application Platform for process automation, API and microservices orchestration, agentic workflows, and more. With global state tracking, Orkes Conductor’s core platform architecture makes it easy to build asynchronous and event-driven workflows:

1. Easily add asynchronous tasks like Wait tasks and Human tasks.

These tasks enable you to durably pause the workflow for an extended period while waiting on an external signal, such as a form completion.

2. Seamlessly consume and publish events with other event-driven systems.

With in-built integrations with dozens of event systems and the ease of creating and monitoring event handlers, you can quickly build and extend your system’s event-driven architecture.

Current message broker integrations available in Orkes Conductor.

Trigger workflows from an event, publish events from a workflow, or create event-driven CDC systems—the flexibility is yours to decide.

3. Get webhook integrations with external systems to send or receive webhook events.

Use webhooks from enterprise systems like Slack, Teams, GitHub, or any custom system you need for asynchronous design patterns.

4. Trigger workflows asynchronously in response to tasks, signals, events, and webhooks.

Unlike synchronous workflows, all asynchronous workflows must be invoked asynchronously in a fire-and-forget pattern due to their unknown completion time. Use Conductor’s SDKs or APIs to start workflows asynchronously.

5. Fully native error-handling and observability mechanisms.

Set custom retries, timeouts, rate limits, and more for every task and workflow based on individual needs.

What an asynchronous workflow looks like

An example could look like kicking off a subtitling workflow whenever an UploadSuccess message is registered in the video processing pipeline. An event handler can be easily created in Conductor to start the subtitling workflow from an external trigger. Since the subtitling workflow involves processing time and perhaps a step for human review, it may take some time to complete – making it an asynchronous workflow.

Asynchronous workflows in Orkes Conductor.

With a fire-and-forget invocation, listeners must be implemented to get notified of an asynchronous workflow completion. In Conductor, a workflow status listener enables you to detect and capture workflow state changes in real-time. These updates can be further transmitted to other systems, such as a message broker, to trigger downstream services or processes. For example, once the subtitling workflow is completed, this change can be captured to make the subtitled video available on the video platform for the end user.

This is how asynchronous workflows in Conductor can be used to manage the complex interactions between multiple processes.

Explore more use cases with our Template Library or try building asynchronous workflows using our online Developer Playground sandbox.

Best Practices for Production-Scale RAG Systems — An Implementation Guide

livw — Fri, 21 Feb 2025 05:11:59 +0000

Knowledge bases can augment AI model responses by providing additional background information. For instance, a financial analyst bot would need access to reports, market prices, and industry news; while a policy advisor bot would need access to hundreds of policy documents.

RAG (retrieval-augmented generation) is a popular method for providing AI models access to such background knowledge. At a high level, such knowledge gets chunked and stored in a database, which is later used to retrieve the most relevant information based on the user query. The retrieved information gets appended to the prompt sent to the AI model, thus improving its final response to the user query.

In theory, it sounds straightforward enough. But to implement a production-ready RAG system, we would need to consider factors like retrieval quality, search speed, and response quality to meet user satisfaction. Let’s explore some common issues in implementing RAG systems and best practices for resolving them. Afterward, we will demonstrate an implementation example built using an orchestration platform like Orkes Conductor.

RAG Explained — Learn the ins and outs of how RAG works, its use cases, and how to build one for yourself.

Common issues in implementing RAG

Documents lose context when chunked, which affects the retrieval quality and subsequent response quality.

For example, chunks in a financial knowledge base may contain revenue data without specifying the company:
```
“Dollars in millions, except per share data  FISCAL 2024  FISCAL 2023 % CHANGE

Revenues    $   38,343      $   38,584      0   %”. 
```
Without the proper context, a search query like “What was the revenue for Acme Inc in 2024?” could pull up dozens of incorrect revenue data for the AI model to process and reference. The model could just as well respond with revenue from Nakatomi Trading Corp or Sirius Cybernetics rather than from Acme Inc.
The vector embedding approach to storing and retrieving information is inherently lossy and may miss out on retrieving chunks with exact lexical matches.

Vector embeddings capture semantic meaning, like lexical relationships (e.g., actor/actress are closely related), intent (e.g., positive/negative), and contextual significance. This approach works well for capturing meaningful information, such that two completely different sentences, “I love cats” and “Cats are the best”, are marked as highly similar due to their conceptual similarity.

On the flip side, this means that precise and specific wording gets lost in the vectorization process. As such, a typical vector-based RAG approach can sometimes fail to pick up on exact lexical matches.

For example, if you are trying to search for information about the Cornish Rex, a chunk like:
```
“The appearance of the German Rex is reminiscent of the European Shorthair. Both cat breeds are of medium size and rather stocky build. The German Rex is a strong, muscular cat with a round head and a broad forehead, pronounced cheeks and large round eyes. It strolls through its territory on medium-long legs. The German Rex is not a graceful, Oriental-looking cat like its Cornish Rex and Devon Rex counterparts. It has a robust and grounded appearance.” - [Source](https://www.catsbest.eu/cat-breed/german-rex/)
```
could be overlooked by the RAG system because it is primarily about the German Rex, and thus stored further away from chunks about the Cornish Rex in the vector space.

Best practices for RAG

Now, let’s explore some best practices to mitigate the common issues outlined above.

Re-introduce context

First: introduce context back into the chunks. This can be as simple as prepending chunks with the document and section titles, a method sometimes known as contextual chunk headers.

Document title: Acme Inc Annual Fiscal Report
Section title: Results of Operation

“Dollars in millions, except per share data  FISCAL 2024  FISCAL 2023 % CHANGE

Revenues    $   38,343      $   37,584      0   %”

Or it can be as elaborate as Anthropic’s context retrieval method, where a summary of the chunk’s relation to the entire document is added to the chunk. In this approach, the contextual summaries are generated by an AI model using a prompt like:

<document> 
{{WHOLE_DOCUMENT}} 
</document> 

Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

Use semantic chunking

Semantic chunking can also help preserve each chunk's context. Rather than fixed-sized chunking, semantic chunking takes meaning and context into account when dividing the text.

In this approach, the text is split into individual sentences that are then indexed as embeddings. These sentence-level embeddings enable us to compare the semantic similarity of each sentence with neighboring sentences and split the chunks based on a breakpoint threshold value. This is useful for maintaining each chunk’s semantic integrity, which is essential for more accurate retrieval.

Measuring the cosine similarity of adjacent sentences for semantic chunking. Source: towards data science

Use hybrid search

Next: use multiple search techniques at once to capitalize on each of their strengths. A hybrid search approach leverages both keyword-based search and vector search techniques, then combines the search results from both methods to provide a final search result.

BM25 (Best Matching 25) is one of the most popular ranking functions, used across major search engines. It’s a bag-of-words retrieval function that ranks documents based on the frequency of the search query appearing in its contents. BM25F is a variant that enables you to modify the weights of different fields, such as making the document body more important than the title.

These keyword-based functions remediate the lossy nature of vector searches, and using both types of search methods at once will cover the major bases in retrieving relevant information.

Rerank retrieved chunks

Reranking can also help to surface more relevant information from the set of retrieved documents. Rerankers are more accurate than embedding models in analyzing and comparing the query against the knowledge base, but are also much slower in processing compared to embedding models.

The best of both worlds (accuracy and speed) means using a two-stage retrieval process, where an embedding model is used to retrieve a subset of information from the entire knowledge base, and a reranker is used to further pare down and refine the search results.

Two-step retrieval process involving the embedding model and reranker.

A RAG implementation walkthrough

How can these best practices be implemented? Let’s look at an example of a production-grade RAG system that is efficiently implemented and monitored using an orchestration platform like Orkes Conductor. Using orchestration, developers can build and monitor complex flows across distributed components, frameworks, and languages. In our case, there are two key workflows required to build a RAG system:

An index workflow
A search workflow

Tip: If you’d like to try building a RAG system yourself, sign up for our free developer sandbox at Developer Playground.

Building the index workflow

The index workflow consists of several parts:

Load a document from a source
Chunk your data
Store the data in your vector and BM25 indexes

The index workflow.

Part 1: Load a document from a source

As an orchestration engine, Conductor facilitates all sorts of implementation choices with its wide variety of tasks. In this example, we’ve used a pre-made Get Document task to retrieve a private policy document stored on an internal URL.

Get Document task.

You could also use an HTTP task to get a document through an API call, or create a custom task for whatever custom implementation.

Part 2: Chunk the data

The chunking task can be implemented using an Inline task or custom worker task. Here’s a sample Inline task code that utilizes a straightforward fixed-size chunking method with some overlap to reduce context loss:

(() => {
  const policy = $.input.substring(0, $.input.indexOf('Policy'));
  const context = 'This chunk is from the policy: ' + policy + '. ';
  const paragraphs = $.input.split(/\n/);
  const chunks = [];
  let currentChunk = "";
  for (const paragraph of paragraphs) {
    if (currentChunk.length + paragraph.length < 1000) {
      currentChunk += (currentChunk ? "\n\n" : "") + paragraph;
    } else {
      if (currentChunk) {
        chunks.push(context + currentChunk);
      }
      const lastChunk = chunks[chunks.length - 1] || "";
      const overlapText = lastChunk.slice(-200);
      currentChunk = overlapText + "\n\n" + paragraph;
    }
  }
  if (currentChunk) {
    chunks.push(context + currentChunk);
  }
  return chunks;
})();

The contextual chunk headers can be created within the same chunking task:

  const context = 'This chunk is from the policy: ' + policy + '. ';
  // … code omitted
      if (currentChunk) {
        chunks.push(context + currentChunk);
      }

The more elaborate situated context approach (à la Anthropic) can be completed in a separate task during the final indexing part.

One major benefit of using Conductor to orchestrate these distributed components is the ease of switching up tasks and managing workflow versions. If we wanted to test whether semantic chunking will be worth the computational cost, it’s as simple as switching out the fixed-size chunking task with a new worker task that runs a different piece of code.

Using Conductor’s SDKs, you can easily write a worker that carries out semantic splitting with your framework of choice (LlamaIndex, Langchain, and so on).

Part 3: Store the data into your vector and BM25 indexes

The final part of the index workflow involves storing the data chunks into indexes.

Before indexing the chunks, you can create and prepend situated contextual summaries for each chunk. These summaries can be created using generative AI models, paired with prompt caching to reduce the cost of creating these contextual summaries.

Again, we can use a custom task worker to generate these contextual summaries using your preferred LLM provider. This sample worker code example leverages Conductor’s SDK with Anthropic’s prompt caching feature:

from conductor.client.worker.worker_task import worker_task

DOCUMENT_CONTEXT_PROMPT = """
<document>
{doc_content}
</document>
"""

CHUNK_CONTEXT_PROMPT = """
Here is the chunk we want to situate within the whole document
<chunk>
{chunk_content}
</chunk>

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.
Answer only with the succinct context and nothing else.
"""

@worker_task(task_definition_name='get-context')
def situate_context(doc: str, chunk: str) -> str:
    response = client.beta.prompt_caching.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=1024,
        temperature=0.0,
        messages=[
            {
                "role": "user", 
                "content": [
                    {
                        "type": "text",
                        "text": DOCUMENT_CONTEXT_PROMPT.format(doc_content=doc),
                        "cache_control": {"type": "ephemeral"} #we will make use of prompt caching for the full documents
                    },
                    {
                        "type": "text",
                        "text": CHUNK_CONTEXT_PROMPT.format(chunk_content=chunk),
                    }
                ]
            }
        ],
        extra_headers={"anthropic-beta": "prompt-caching-2024-07-31"}
    )
    return response

Once processed, we can finally index these chunks. Using a hybrid search approach means that the chunks must be indexed in a (i) vector database and (ii) BM25 index. With Orkes Conductor, we can easily use a Fork-Join operator to index the same chunk into both indexes simultaneously, speeding up the process.

Here, a pre-made Index Text task is used to store the chunks into a vector database, while an internal API is used to store the chunks into a BM25 database.

The data are indexed into a vector database and BM25 index in parallel.

With that, the index workflow is completed. To build out your knowledge base, run the workflow to index your policy documents.

Building the search workflow

The search workflow retrieves relevant documents from the knowledge base and answers the user query. In production, a search workflow would include the following steps:

Retrieve relevant chunks using a hybrid search approach
Rerank the search results based on the user query
Generate the answer to the user query based on the most relevant results

The search workflow.

Part 1: Hybrid search

Since we are using a hybrid search approach, another Fork-Join operator is used to retrieve information from both indexes at once. Here, a pre-made Search Index task is used to retrieve from the vector database, while an HTTP task is used to call an internal API to the BM25 database.

Both vector database and BM25 indexes are searched in parallel based on the user query.

Part 2: Rerank

Once the retrieval stage is completed, we can use a custom worker task to rerank the search results, by leveraging rerankers from providers like Cohere or Voyage AI. Here’s a sample code that uses Cohere’s reranker:

from conductor.client.worker.worker_task import worker_task

@worker_task(task_definition_name='reranker')
async def rerank(input: RerankInput, k: int = 10):
    co = cohere.Client(os.getenv("COHERE_API_KEY"))
    response = co.rerank(
        model="rerank-english-v3.0",
        query=input.query,
        documents=input.data,
        top_n=k
    )
    return [input.data[r.index] for r in response.results]

Part 3: Generate answer

Finally, a built-in Text Complete task is used to interact with an LLM, which will generate the answer based on the top reranked information. Using Orkes Conductor to orchestrate the flow, you can easily integrate and interact with any LLM provider, from OpenAI and Anthropic to open-source models on HuggingFace.

Generate the answer to the user query using a templatized prompt.

The Text Complete task sends the LLM a prompt template that is injected with the user query and the RAG-retrieved background knowledge. Orkes’ AI Prompt Studio feature makes it easy for developers to create, manage, and test these prompts, facilitating the prompt engineering process to enhance the LLM output.

Using some of the common prompt engineering tactics, here is an example prompt used in the RAG system:

Answer the question directly based on the context provided.
Do not repeat the question.
Do not mention the existence of any context provided.

<context>
${context}
</context>

<question>
${question}
</question>

Done! The search workflow is completed. Unlike the index workflow, the search workflow is used for your system runtime, when your users interact with your application to make queries.

Search Workflow in Conductor.

Why use orchestration to build AI systems?

Orchestration is an ideal design pattern to follow when it comes to building distributed systems that are composable, governable, and durable. As demonstrated in the RAG example above, the workflows can be easily composed from multiple services, packages, frameworks, and languages. As systems evolve and refine, developers can switch out tasks, use new frameworks, test different AI models, and implement best practices frictionlessly.

Furthermore, an orchestration platform like Orkes Conductor unlocks complete visibility into each step of the workflow, from its task status to its inputs/outputs and even completion duration. For complex AI-driven systems, where multiple layers of AI interactions take place under the hood, the ease of monitoring becomes even more vital for troubleshooting and optimizing these interactions.

Unlock complete visibility into each step of the workflow, from its task status to its inputs/outputs and even completion duration.

Most importantly, Conductor is hardened for failures, with comprehensive mechanisms for timeout, retry, idempotency, compensation flows, rate limits, and more. Such orchestration engines ensure the durable execution of any workflow, long-running or otherwise.

—

Conductor is an open-source orchestration platform used widely in mission-critical applications such as AI/LLM systems, event-driven systems, and microservice or API-first systems. Try it out using our free Developer Playground.

Lowering Latency with Realtime API Orchestration

livw — Tue, 07 Jan 2025 15:59:00 +0000

Today’s software applications run on the big three: cloud, microservices, and APIs. Although APIs are a lightweight, flexible, and easy-to-consume means of interconnecting multiple services and data sources, the complexity of orchestrating multiple API calls adds up quickly. In production, APIs come with a myriad of implementation concerns: how to route traffic, handle request spikes, eliminate cascading failures, and manage execution flow all at once.

Enter stage right: realtime API orchestration.

What is API orchestration?

API orchestration streamlines complex application flows by coordinating multiple API requests and responses to produce a concerted response for the end client, such as a web browser. This approach is indispensable for building responsive, user-first web applications, as it accelerates loading speeds and safeguards against failure scenarios.

The API orchestration layer enables the centralized flow of multiple APIs, standardized failure handling, granular workflow observability, and more.

The need for speed

Consider loading a website application—your preferred investment platform or streaming service. A single page typically fetches data across multiple services, which must be aggregated before it is delivered to the end user.

A single page fetches data across multiple services, which must be aggregated before it is delivered to the end user.

These API requests add up, and a longer load time translates into substantial business costs. According to recent surveys, a B2B site that loads in 1 second has a 3x conversion rate compared to a site that loads in 5 seconds and 5x compared to a site that loads in 10 seconds. Speed could be the make-or-break criterion for high-intent web pages, like log-in screens, transaction/checkout pages, or demo interfaces.

Source: Portent

Strategies for realtime speed in API calls

What are some of the strategies you can use to improve your API performance?

Taming the tail latency

Your aggregate API is only as fast (or slow) as the slowest API call. One well-known way to handle tail latency is to hedge your API requests (making multiple requests) and use the request that completes the fastest. This ensures that the tail latencies don’t impact your overall API performance.

Parallel API requests

Send requests in parallel wherever possible to improve the throughput of your request volume.

Caching

A cache serves as a temporary, high-speed data storage layer for previously retrieved data, so that repeated requests do not need to be attempted again. Caching cuts precious processing time for the most frequent API calls, providing speed at scale.

Circuit breakers

The circuit breaker pattern blocks applications from sending traffic to services that are down until the service has recovered. This enables faulty services to recover and stabilize without being inundated with requests, shortening the downtime.

API orchestration

API orchestration provides a high-speed switchboard that executes, monitors, and governs multiple API calls simultaneously. An orchestration platform comes optimized for low latencies and equipped with features that enable teams to easily implement the various design patterns for realtime speeds.

Benefits of API orchestration

API orchestration is a full-scale solution for scaling API performance alongside an application’s growth. As the application complexity grows, the dependency graph of the APIs that need to work together also grows. And as that grows, it becomes harder to debug, detect failures, and gain insights into latencies.

More than just executing an intricate graph of API calls, API orchestration empowers teams to track, debug, and detect failures and performance issues.

Performance at every level—Complex API execution at realtime speed.
Increased reliability—Automated failure handling and fallback mechanisms.
Governance—Visibility into execution graphs for debugging and performance metrics for monitoring.
Developer productivity—Build and debug more quickly with version control, reusable configs, payload introspection.

API orchestration in practice: Orkes Conductor

Orkes Conductor—originally built at Netflix—is a well-known platform for orchestrating microservices. It lets you build distributed applications that are resilient and easily scale with both the volume and complexity of the services.

Let’s explore an example application flow that relies on complex API orchestration, built using Orkes Conductor.

Example Conductor workflow.

In this example, we are loading a list of financial assets that can be added to a watchlist. This means retrieving the user data, the list of assets, and the current watchlist; followed by posting additional data when new assets are added to the watchlist.

Using Orkes Conductor, caching behavior can be easily implemented, which cuts the request time from mere milliseconds (10-40ms) to near-zero.

With caching enabled, subsequent requests for commonly-used APIs take almost no time to execute.

Rate limits, retries, and timeouts can be natively configured on Conductor as well, ensuring that transient failures are automatically handled for every single execution.

The orchestration layer will retry failed task executions based on the failure handling configuration.

Last but not least, Conductor supports parallel requests, enabling non-blocking API calls to execute simultaneously. Static fork-joins are useful when the number of API calls is predetermined ahead of runtime, while dynamic fork-joins are especially handy when the number of calls is determined at runtime. For example, a dynamic fork is used when a user adds a number of assets to the watchlist, and all the PUT requests are executed in parallel, even if two or twenty assets are added.

With parallel execution, the total execution time only takes as long as the longest request.

In this demonstration, we have seen how API orchestration works in practice. As shown, API orchestrators yield low latencies through a variety of features and capabilities beyond just coordinating requests.

Core capabilities of realtime API orchestrators

When deciding on an API orchestration platform, here are some key requirements to look out for:

Near-zero response time
High throughput
Customizable caching
Payload enforcement
Integration with event streams
Support for various API protocols and specifications (REST, gRPC, GraphQL)
Comprehensive failure handling implementation (circuit breakers, request hedging, rate limits, retries)

Wrap up

With the right tool for API orchestration, developers no longer have to spend time writing code for common API design patterns from scratch. This means more time to focus on the core recipe of your business capabilities while ensuring top-notch execution, speed, and reliability.

Conductor is an open-source orchestration platform for executing durable long-running flows, lightning-speed API calls, or any case in between. Check out the full list of features or start building for free with our Developer Playground today.

Experimenting and Putting Prompt Engineering Tactics into Practice

livw — Thu, 28 Nov 2024 07:55:22 +0000

This is Part 2 of the Prompt Engineering series, which will put prompt engineering tactics into practice, including model choice and LLM parameters like temperature. Check out Part 1 on the importance of creating effective prompts.

Prompt engineering is the practice of enhancing LLM responses by writing optimized prompts and tuning parameters like temperature or topP. There are many strategies and tactics for creating effective prompts, but how do these translate into real-life situations? Which tactics are the most effective? How should I combine prompt writing and parameter tuning? Let’s explore some real examples of prompt engineering put into practice.

Engineering better LLM responses

OpenAI documentation provides six main strategies for prompt engineering:

Write clear instructions
Provide reference text
Split complex tasks into simpler subtasks
Give the model time to "think"
Use external tools
Test changes systematically

These strategies point to the overarching principle of prompt engineering: providing clear, specific, well-structured instructions. There are many ways to split hairs about how exactly to do that, with numerous guides available online (including our previous Guide to Prompt Engineering, where we introduced the basics of creating clear and effective prompts). What we want to do is to dig deeper into how model choice, prompt writing tactics, and parameter tuning interact and interplay to produce the ideal responses.

In the following sections, we will explore each facet in turn and finally put it all together, so you have a concrete idea of what to focus on when implementing prompt engineering in your AI projects.

Model choice

Without any prompt engineering, your choice of LLM serves as the ringfence determining the base quality of the response output. To illustrate, here is the response by Cohere’s command model compared to Mistral’s mistral-small model. Both models were asked to summarize an article about NASA’s Europa Clipper mission:

Summarize ${text}
// Assume that ${text} is a dynamic variable that is injected with the 
relevant content at runtime

While both models provide a relatively accurate summary of the article, Mistral offers a well-structured response out of the box without additional prompting.

Various LLM models differ in their instinctual, un-engineered responses.

Depending on your use cases and system requirements, Mistral’s responses could be overly-engineered or the more optimal response. Nevertheless, using prompt engineering tactics, you can get command to respond just like mistral-small.

Tactics for effective prompts

To provide clear, specific, well-structured instructions, these are some of the tactics employed in the prompts shown below:

Break up the instructions for clarity.

Instead of combining the article and the instructions in a single sentence, the prompt is now divided into distinct sections: one for the content and another for the instructions on how to evaluate the content.
Demarcate your prompt with special characters or tags where appropriate.

The article content is now enclosed in quotation marks to explicitly mark the text that should be summarized.
Include details about the desired output. Quantify where possible.

In one variation, the prompt now asks for the response to be “clear and concise” and even specifies the desired format of using bullet points. In another, it explicitly states the desired length of the output.

There are numerous ways to implement prompt engineering tactics depending on what output you need.

Check out how the responses have improved in each of these prompts.

Prompt: Here is the text from an article: "${text}". Provide a summary 
of this text in a clear and concise manner.

Response: NASA has launched the Europa Clipper mission, a $5.2 billion 
robotic spacecraft, to explore Jupiter's moon, Europa, which is believed 
to have a global ocean of liquid water beneath its icy crust and could 
potentially support extraterrestrial life. The mission aims to determine 
if Europa has the necessary ingredients for life, including water, 
organic compounds, and an energy source. The spacecraft, the largest 
and most expensive planetary science mission in NASA's history, will 
conduct 49 close encounters with Europa, studying its ocean, atmosphere, 
and surface features. Scientists are particularly interested in the 
possibility of hydrothermal vents on Europa's ocean floor, which could 
support simple life forms. The mission also aims to find potential 
landing sites for future lander missions. Europa Clipper's findings will 
shape future exploration of Europa and the search for life beyond Earth.

Prompt: Here is the text from an article: "${text}". Provide a summary 
of this text in a clear and concise manner, starting with the title 
**Summary**, and including the following bullet points:
- **Mission:**
- **Objective:**
- **Spacecraft:**
- **Journey:**
- **Scientific Goals:**
- **Future Missions:**

Response: Summary

- **Mission:** NASA's Europa Clipper mission aims to explore Jupiter's 
moon, Europa, a promising location for finding extraterrestrial life.
- **Objective:** The primary goal is to determine if Europa has the 
necessary ingredients for life, particularly simple, single-celled 
organisms.
- **Spacecraft:** Europa Clipper is the largest and most expensive 
planetary science spacecraft NASA has ever launched. It carries a 
sophisticated suite of nine instruments to study Europa's ocean, 
surface, and atmosphere.
- **Journey:** The spacecraft was launched on a SpaceX Falcon Heavy 
rocket and will take a 5.5-year journey to Jupiter. It will fly by Mars 
and Earth for gravity assists, reaching Jupiter in 2030.
- **Scientific Goals:** Europa Clipper will study Europa's ocean, ice 
shell, atmosphere, and potential plumes to understand its habitability. 
It will search for water, organic compounds, and energy sources 
necessary for life.
- **Future Missions:** A Europa lander is a challenging prospect, but 
a compelling discovery could prompt a follow-up mission. Other high-
priority missions include a Uranus orbiter and a mission to Saturn's 
moon Enceladus.

Prompt: Here is the text from an article: "${text}". Summarize the text 
clearly in 280 characters using this format "Article summary: <your_
summary_here>".

Response: Article summary: NASA's Europa Clipper mission launched on a 
SpaceX Falcon Heavy rocket to explore Jupiter's moon Europa, which may 
harbor life. The spacecraft will study Europa's ocean, atmosphere, and 
surface, seeking signs of habitability and potential life.

Temperature

In some cases, it is also useful to tune the temperature of the responses. This LLM parameter affects how creative or deterministic the output will be — the higher the temperature, the more random and creative the output will be.

While temperature is associated with creative tasks like generating a poem or a story, it can also apply to factual writing, like summaries. Take a look at this response below, where the temperature is set to 0.1. The text output reuses similar words or phrasing, and this repetitiveness makes the summary rather stale and unengaging.

A low temperature favors more frequent words or phrases, which can make the output repetitive.

In contrast, a temperature of 1 leads to more dynamic and lively text, but tends to exaggerate or include less important information. This would mean that the summarizer is prone to creating misleading snapshots of the articles.

A high temperature produces rarer words or phrases, which can lead to a focus on less important information.

A more moderate temperature of 0.4 could be the sweet spot for an article summarizer, providing relevant information with livelier language.

Tune your temperature to a more moderate value to avoid repetitiveness and steer clear of overly-exaggerated information at the same time.

Evaluating your LLM responses

We’ve seen individually how model choice, prompt writing tactics, and LLM parameters can affect your desired output. But how do these factors stack together to get optimized results at scale? While prompt writing tactics get you the biggest mileage in obtaining a particular result, LLM parameter tuning and model choice could be the distinguishing factor between an average response and a stellar one.

Using the following prompt with different models and temperature values:

Here is the text from an article: "${text}". Summarize the text clearly 
in 280 characters using this format "Article summary: <your_summary_
here>".

We get the following responses:

At a small sample size, it is easy to pick which answer is more desirable. But in a production setting, it is best practice to systematically test changes using a larger sample size to ascertain between randomness and true improvement.

One way to systematically track changes would be to evaluate the LLM responses against a benchmark — using relevant criteria or gold-standard answers. In this case, we want summaries that are concise yet comprehensive — answering the 5W1H (what, who, where, when, why, and how).

These evaluations can be manually done by humans or AI-assisted. Whichever route you choose, ensure that you can evaluate a large sample set with sufficient variety to cover all the edge cases in your use scenarios.

Optimizing LLM responses for AI app development

In practice, an application that leverages generative AI is likely far more complex than using a simple one-shot prompt. Multiple prompts can be used across different tasks in a process, or AI techniques like retrieval-augmented generated (RAG), computer vision, and other data pre-processing tasks may come into play.

An AI orchestration engine like Orkes Conductor can streamline such developmental efforts with its built-in AI tasks and feature-rich AI prompt builder, where you can test various prompt engineering strategies. With one-step integrations for a dozen LLM providers (including OpenAI, HuggingFace, and more), Orkes’ AI prompt builder unlocks the convenience of plugging, testing, and playing with prompts across models.

Orkes Conductor AI prompt builder, as part of its AI Orchestration capabilities.

Besides testing, these prompts can be templatized, saved, and safely used in a production setting for AI-driven workflows and applications, such as document classification, RAG-based search retrieval, approval journeys, and more.

Wrap up

By implementing various prompt engineering tactics, we have explored effective ways to combine model choice, prompt writing, and parameter tuning and systematically test these changes.

Beyond what we have explored here, there are many other ways to improve model outputs, such as model fine-tuning or retrieval-augmented generation (RAG). A strategic combination of these different techniques and methods will unleash the full potential of generative AI in automating creative, complex, or human-involved tasks.

Using Orkes Conductor, you can rapidly build, optimize, and test AI-enabled applications in a distributed environment. Conductor is an open-source orchestration platform used widely in many mission-critical applications for orchestrating microservices, AI services, event handling, and more. Try out a fully-managed and hosted Conductor service with the free Developer Playground.

How to Develop an AI Application: Step-by-Step using Orkes Conductor

livw — Thu, 21 Nov 2024 05:30:43 +0000

This is Part 1 of the AI App Development series, which will demonstrate how to build a simple AI application using Conductor. Check out Part 2 for an AI-powered document classifier.

The potential of using AI for enterprise use cases is vast, but building an AI-powered application from scratch involves a deeply technical tech stack. By leveraging an orchestration platform like Orkes Conductor, you can easily govern these moving parts into a well-coordinated flow, be it during development, testing, or production phases.

This introductory tutorial will demonstrate how to develop enterprise-ready AI applications using Conductor. Let's start with a simple article summarizer. While straightforward, the workflow can be abstracted and adapted for practical use cases across industries, such as generating movie synopses for a streaming platform or extracting key highlights from quarterly earnings reports.

Building an AI application with Conductor

As an orchestration engine, Conductor powers code-based flows like cloud infrastructure management, shipping and order tracking, media delivery pipelines, LLM chains, and so on. Conductor oversees the workflow execution and manages the plumbing matters of a distributed environment, such as data flow, timeouts, retries, and compensation flows, so that applications can be more quickly brought to an enterprise-ready state. These capabilities are instrumental for building AI-enabled applications, where velocity and agility are paramount to success.

At a high level, building with Conductor involves three simple steps:

Get access to Conductor. This is where you will build the application flow.
Build the AI-powered application flow.
Write the application frontend and backend. Conductor can easily be integrated with any programming language, allowing you to trigger Conductor flows in your backend using our SDKs.

Get access to Conductor

To begin, create an account in an Orkes Conductor cluster. For this tutorial, you can use the free Orkes Playground to follow along.

Create the LLM-enabled application flow

Conductor provides an out-of-the-box suite of LLM system tasks that are convenient to use in most cases. For more complex AI tasks, developers can opt to create their own task workers in any language.

Step 1: Create your application flow using Orkes’ visual workflow editor

In a straightforward article summarizer, the application flow involves two tasks:

Retrieve the article from a given URL.
Prompt an LLM to provide a summary of the article.

Application flow using Conductor as the orchestration engine.

For Task 1, we can use the Get Document task, which can retrieve text from various content types. In this case, text from an HTML file.

For Task 2, we can use the Text Complete task to call an LLM with a prompt. Both tasks are system tasks that eliminate the need to write custom code to integrate with LLM providers.

To create the article summarizer flow:

Go to Orkes Playground.
In the left navigation menu, go to Definitions > Workflow. Workflow." width="800" height="418">
Select Define Workflow in the top right. The visual workflow editor appears.

Select the Code tab on the right and paste the following JSON code:

{
"name": "studyPartner",
"description": "AI application that summarizes an article",
"version": 1,
"tasks": [
{
 "name": "get_article",
 "taskReferenceName": "get_article_ref",
 "inputParameters": {
   "url": "\${workflow.input.url}",
   "mediaType": "text/html"
 },
 "type": "GET_DOCUMENT",
 "cacheConfig": {
   "key": "\${url}-\${mediaType}",
   "ttlInSecond": 360
 }
},
{
 "name": "summarize_article",
 "taskReferenceName": "summarize_article_ref",
 "inputParameters": {
   "promptVariables": {
     "text": "\${get_article_ref.output.result}"
   },
   "llmProvider": "providerNameHere",
   "model": "modelNameHere",
   "promptName": "promptNameHere",
   "temperature": "\${workflow.input.temperature}",
   "topP": "\${workflow.input.topP}"
 },
 "type": "LLM_TEXT_COMPLETE"
}
],
"inputParameters": [
"url",
"temperature",
"topP"
],
"schemaVersion": 2,
"timeoutPolicy": "ALERT_ONLY",
"timeoutSeconds": 0
}

Change the workflow Name to something unique.
Select Save > Confirm.

Your workflow should look like this:

Article summarizer workflow.

Now that your workflow is ready, it’s time to get it up and running by adding your LLM integration.

Step 2: Add your preferred LLM integration

Orkes offers dozens of integrations with all major LLM providers—OpenAI, Anthropic, Google, Amazon, Cohere, Mistral, Hugging Face, and so on.

AI-related integrations in Orkes Conductor.

To add an integration:

Grab your API key from your LLM provider.
In the left navigation menu of Orkes Playground, go to Integrations.
Select New Integration and select your preferred LLM provider.
Enter the required fields, such as the Integration Name, Description, Access Credentials, and API Endpoint. The required fields differ by LLM provider, so you can refer to the Integration Docs for guidance.
Make sure to enter a unique value for the Integration Name, such as “OpenAI_yourNameHere”.
Select Save.

With the LLM integration added, you can start adding the specific models offered by the LLM provider.

Each model has different capabilities or is tuned for a different use case. Which model you choose depends on your use case — for our article summarizer, a general conversational model with text capabilities will suffice.

To add a model:

In the Integrations page, select the + button next to your newly-added integration.
Select New model.
Enter the Model name and Description. Ensure that the Active toggle is switched on.
Select Save.

Done! With the right prompt, you can now start using the LLMs in your workflows. In the next few steps, you will add the LLMs to the prompt template and workflow before using it.

Step 3: Create a prompt template using Orkes’ AI prompt builder

A prompt is necessary to get the model to summarize an article. Since we are building an AI article summarizer and not a general-purpose chatbot, the prompts can be templatized and automatically fire with the necessary context. Orkes’ AI prompt builder allows you to do exactly that: create and test prompt templates with multiple models.

Create and test prompt templates with any LLM in Orkes.

To create a prompt template:

In Orkes Playground, go to Definitions > AI Prompts.
Select Add AI prompt.
Enter a unique Prompt Name, such as “summarizeText_yourNameHere”.
In Model(s), select the models which the prompt can be used with.
Enter a Description of what the prompt does. For example, “Takes an article content and summarizes it.”
Enter your Prompt Template, which can be as simple as the following:
```
Summarize ${text}.
```
Here, ${text} is a variable input. At runtime, this variable will be replaced with the article content — for example, “Summarize NASA's Europa Clipper spacecraft lifted off Monday from Kennedy Space Center in Florida aboard a SpaceX Falcon Heavy rocket, [...]”.
Once done, select Save > Confirm save.

Now, you can start testing your prompt. To do that, pick a specific model to test and tune the LLM parameters, like temperature, stop words, and topP. Then paste in the variable substitute for ${text} and run the prompt to get the LLM response.

Test your prompts with your chosen model, text variables, and prompt variables.

We’ll explore more methods for engineering better responses in an upcoming blog post. For now, let’s put together the finishing touches for your article summarizer flow.

Step 4: Put it all together

Recall the JSON code that you copied to create your workflow? Now that you have added your LLM models and created your prompt, it’s time to put these resources into the JSON code (ie your workflow definition).

To put it all together:

In Orkes Playground, go to Definitions > Workflow and select the workflow you have created previously.
In the summarize_article task, replace the following values:
- Replace providerNameHere with your chosen LLM provider.
- Replace modelNameHere with your chosen model.
- Replace promptNameHere with your prompt template name. Make sure to add back the following variable for the prompt:
```
"text": "${get_article_ref.output.result}"
```
Select Save > Confirm.

Done! Give your workflow a test run:

From the visual workflow editor, select the Run tab.

Enter the Input params and select Run workflow.

// example input params
{
 "url": "https://arstechnica.com/space/2024/10/nasa-launches-mission-to-explore-the-frozen-frontier-of-jupiters-moon-europa/",
 "temperature": "0.1",
 "topP": "0"
}

Upon running the workflow, you will be directed to the workflow execution page, where you can track the progress of your application flow. If you select the Workflow Input/Output tab, you should see the summary of the article you requested.

The article summarizer returns the requested summary.

Write the application frontend and backend

With the application flow created, the next step is to build the application itself. Use any framework (React, Next.js, Angular, and so on) to build the frontend and backend. For the backend, you will also use Conductor’s SDKs to execute and track workflows. Here is an example snippet of a React-based backend that uses the JavaScript SDK to execute the studyPartner summarizer workflow created earlier.

Example

import { useState, useEffect, useRef } from "react";
import {
  orkesConductorClient,
  WorkflowExecutor,
  TaskType,
} from "@io-orkes/conductor-javascript";
import getConfig from "next/config";

const { publicRuntimeConfig } = getConfig();

const getSummary = async (articleUrl, temp, topP) => {
  const client = await clientPromise;
  const executor = new WorkflowExecutor(client); // Create the executor instance

  const executionId = await executor.startWorkflow({ // Start the workflow
    name: publicRuntimeConfig.workflows.studyPartner,
    version: 1,
    input: {
      url: articleUrl,
      temperature: temp,
      topP,
    },
    correlationId: "user123",
  });

  setExecutionId(executionId); // Persist executionId in state
};

Connecting your application with Conductor

While writing your backend, make sure to get authorized access to Conductor so that your backend can fire the workflow without any issues.

Step 1: Get access tokens for your application

In the Conductor UI, go to Applications in the left navigation menu to create your application abstraction layer and generate the access tokens. To do so,

In Applications, select (+) Create application and enter a name for your application.
In the Access Keys section, select (+) Create access key to generate a unique Key Id and Key Secret, and note it down.

Important: The Key Secret is shown only once, so make sure to copy and store your credentials securely for future reference.

Step 2: Configure access

Set the Key Id and Secret in your project environment and point to the appropriate Conductor server. If you are using Orkes Playground, the server should be https://play.orkes.io/api.

Example

export CONDUCTOR_SERVER_URL=<SERVER_URL>
export CONDUCTOR_AUTH_KEY=<KEY_ID>
export CONDUCTOR_AUTH_SECRET=<KEY_SECRET>

Step 3: Configure permissions

Finally, configure the permissions for the application layer you have previously created so that your application project can access the necessary resources. To set the required permissions:

In Application in the Conductor UI, select your application.
In the Permissions section, select + Add Permission.
Add Execute and Read permissions to the following resources:
- Your article summarizer workflow
- The LLM models used in your workflow
- The prompts used in your workflow

Congratulations! You have successfully created an AI article summarizer. Using Orkes’ AI prompt builder, you can optimize the LLM responses to fit your needs by implementing prompt engineering techniques and testing your prompts to improve the LLM responses.

Going beyond

With Orkes Conductor, you have created an AI application in no time at all. Now that you have the basics down, you can try your hand at creating more complex workflows, like a document classifier or automatic subtitle generator, or leveling up your summarizer workflow for more advanced uses, like summarizing video or audio content. Custom task workers can be easily built for advanced AI tasks using Conductor’s SDKs.

Using Orkes Conductor to build applications spells faster time-to-market, enterprise-grade durability and security, and full governance in a distributed program. From pre-built tasks to fully custom business logic, developers get the best of both worlds: speed and flexibility. Simply brainstorm the high-level flow, pinpoint which tasks are needed, and start building.

Wrap up

As an open-source orchestration platform, Conductor can be used in diverse cases beyond AI orchestration, such as infrastructure automation, data transformation pipelines, digital user journeys, microservice coordination, and more.

Want detailed examples? Check out other tutorials and use cases:

Stay tuned for more AI-based tutorials coming soon.

Orkes Cloud is a fully managed and hosted Conductor service that can scale seamlessly to meet your needs. When you use Conductor via Orkes Cloud, your engineers don’t need to worry about setting up, tuning, patching, and managing high-performance Conductor clusters. Try it out with our free Developer Playground.

Monolith to Microservices: Should I Migrate and How?

livw — Thu, 03 Oct 2024 09:50:35 +0000

Microservices have dominated software development in the past decade as the most popular system design. Over 70% of organizations surveyed by Gartner in 2023 use microservices, with 22% having migrated over within the past 12 months. While there has been buzz about microservices being over-hyped, the trend towards microservices remains a steady force, with more than 20% of surveyed organizations planning to make the shift to microservices.

Just like any system design, microservices is not a one-size-fits-all approach. What are the benefits and challenges of using microservices? Should my organization switch to a microservice-based design? How can I migrate from a monolith to microservices? Let’s explore these considerations below.

Monoliths vs Microservices

In a monolith design, the entire system is built and deployed as a single unit. In other words, the user interface, application methods and services, and database are stored in a single codebase and these modules are tightly coupled, typically with a shared context.

In contrast, a microservices-based design consists of separate services that can be individually deployed. These services each have its own database and bounded context, and are loosely coupled. Communication between each service occurs over a network protocol, typically through API, gPRC, message brokers, or an orchestration layer.

Architecture of a monolithic application vs a microservices-based application.

Benefits of monolithic architecture

Because monoliths are designed as a single consolidated system, it is typically the preferred approach in the early stages of development.

Simplicity: Monolith systems are more straightforward to build, test, deploy, and debug—developers don’t have to worry about coordinating and tracing across distributed components.
Lower infrastructure overhead: With only one application to monitor and maintain and fewer moving parts, the costs and infrastructure requirements are easier to manage.
Lower latency: Different services in a monolith are tightly coupled, with a shared memory and no need to communicate over a network. This set-up provides better performance and faster response times at a given load.

Limitations of monolithic architecture

However, as an application grows with new features, the limitations of a monolith architecture become apparent:

Slower development cycles: Multiple teams working on the same codebase can lead to conflicts or breaking changes, requiring careful coordination and resulting in slower releases. Small changes or bug fixes become too effortful to make in light of long build times.
Limited scalability: Horizontal scaling across multiple machines calls for the entire application to be scaled, even if only certain modules require more resources. It’s not just harder to optimize specific components in a monolith but also more inefficient at resource utilization.
Risky to innovate: A tightly-coupled system means that new features or changes to the tech stack can result in unintended cascading consequences elsewhere, making it harder and harder to implement changes over time.

At some point, the monolith system becomes too complex to maintain at a performant level. Without proper code discipline and organization, you run the risk of creating a big ball of mud or spaghetti code. This is when the shift towards microservices becomes relevant.

Benefits of microservices

When implemented carefully, microservices offer numerous benefits that overcome the limitations of monoliths.

Reliability: Unlike a monolith where one unresponsive component brings down the entire system, a microservice-based architecture reduces the blast radius. Orchestration patterns like retries and compensation workflows also help to handle partial failures gracefully.
Speed: Likewise with independently-deployed services, teams can iterate features and fix bugs more quickly, using automated CI/CD pipelines to test and deploy releases.
Scalability: Since each service has its own dedicated resources, they can be scaled independently based on demand, maintaining service performance while minimizing resource wastage.
Flexibility: Each service can use the tools, technologies, and language best suited to its functionality, and critical services can be reused across multiple domains.

Challenges of microservices

The benefits of loosely-coupled, independently-deployed services with a bounded context come with some trade-offs. Distributed systems tend to be more complex, which introduces additional challenges for inter-service communication, resource management, data persistence, and testing and debugging.

Failure: The more components a system has, the more points of failure get introduced. Distributed systems, like microservices, face a greater risk of single points of failure, cascading failures, resource leaks, and network failures.
Infrastructure: Microservices require more infrastructure overhead, like a dedicated DevOps team, tools for service discovery, load balancing, networking, message queues, orchestration, and so on.
Data: Maintaining data consistency across services, especially for distributed transactions, also poses more challenges compared to a single, unified database, and requires more careful planning.
Testing: It is much more difficult to design end-to-end tests with production environments for distributed systems, and likewise, debugging poses challenges with global observability and reproducibility.

Microservices: To migrate or not to migrate

As organizations and systems scale, the question of migrating from a monolith to microservices inevitably emerges. Microservices have provided businesses like Netflix, Uber, Amazon, and SoundCloud a way to move and innovate rapidly, even in large teams.

A successful migration involves many technical hurdles but is a worthy investment when there is a solid business case to create a clear separation between services. You should migrate to microservices when:

Multiple teams working on the same codebase create blockers in development speed.
A single service needs to be reused or shared across multiple programs or domains.
Significantly more computing resources are required for a particular feature but not others.
A specific functionality becomes a unique business capability that requires more innovation and a dedicated team.

How to migrate to microservices

If microservices prove to be the path forward, next comes the dreaded project of overhauling, refactoring, and migrating your code. Here are some best practices you can follow when migrating from monolith to microservice:

1. Plan the migration using domain-driven design

One of the biggest challenges in migrating a monolith to a microservice-based system is ensuring proper separation of services. If not carefully implemented, it is easy to end up with a distributed monolith, where the services communicate over a network protocol, but are still tightly-coupled and dependent on each other. A distributed monolith should be avoided, as it introduces all the challenges of a microservice-based architecture without any of its advantages.

Before migrating, inventorize the monolithic codebase and identify areas where services can be decoupled from each other. Despite what its name suggests, microservices do not need to be micro: as long as there is a bounded context, data model, and independent deployment, the service can be as encompassing as your system demands it to be.

Domain-driven design is a handy set of principles for deciding how to split up your monolith. The core idea is to model your services based on the business domain—an insurance app entails concepts like claims, plan tiers, or renewals, while a shopping app should model concepts like customers, products, or discounts. Using these standard terminology, you map out the services required in your system—a shopping app would have a product inventory, checkout, and order tracking. These are the bounded contexts that have emerged from the monolith, allowing you to design the scope and function of your microservices grounded in business sense.

Example of the bounded contexts in an order application and how they are linked to one another.

2. Gradually decouple your monolith services using a strangler pattern

Strangler pattern. Reference

Once you have identified the candidates for microservice refactoring, gradually pull them out of the monolith by setting up a separate microservice. During this process, any new feature or service should be implemented as a separate microservice instead of being added to the monolith. These microservices, extracted or new, should have their own database. Integration code, or glue code, is added to bridge between the microservice and the monolith in the interim.

Glue code is used to bridge between the extracted services and the remaining monolith during the migration process.

Once the microservice is ready to be used as a standalone service, use blue/green or canary deployments to gradually transition your user traffic from the monolith and roll back if necessary.

Use blue/green or canary deployments to gradually transition your user traffic from the monolith to the newly-extracted microservice.

3. Set up robust testing for your microservices

Finally, when deploying your microservices, testing is crucial to ensuring that there are no code regressions. Since microservices can be tested and deployed individually, the approach should fundamentally differ from testing monoliths.

While it is easier to test your microservices independently, it is more challenging to visualize and trace the entire process from end to end. Here are some tips for testing microservices:

When doing unit tests, the microservice’s inputs/outputs should be treated as one of the critical functionalities to test, avoiding overreliance on test doubles or mocks.
Instead of having developers spin up an end-to-end application instance, use a shared test environment to reduce test coupling and isolate the test request to minimize its impact.
Use context propagation tools like OpenTracing or OpenTelemetry to get a view of your distributed system.
Implement chaos engineering to test the resilience of your system.

Managing inter-service coordination using orchestration

One key challenge of using microservice: what about the plumbing code that manages the communication between all your microservices? Orchestration is a powerful way to coordinate services and components in a distributed system, providing both state tracking and durable execution.

Orchestration simplifies the added complexity and infrastructure overhead of microservices by providing a centralized platform to model, manage, and route your application flows and implementation details, abstracted away from the business logic encoded in each microservice.

Microservice orchestration with Conductor

Conductor is an open-source orchestration platform used widely in many mission-critical applications for microservice orchestration, LLM chaining, and event handling. Using orchestration with Conductor, teams can build multi-language microservice-based systems that are even more fault-tolerant, highly observable, and performant.

Resilience parameters: Conductor allows you to specify resilience parameters like the number of retries, rate limits, compensation flows, and timeouts, decoupled from your microservice logic, and provides in-built tools to recover from failures gracefully.
Data flow: Be it inputs/outputs, secrets, or environment variables, Conductor securely stores and passes these parameters across microservices.
Centralized monitoring and logging: Conductor provides a centralized platform to inspect your workflow execution and logs for troubleshooting, as well as to monitor your workflow performance and cluster health.
Load balancing: Conductor runs on a worker-task queue architecture that round-robins tasks across a pool of workers that can be dynamically scaled based on demand.
Lifecycle management: From development and testing to deployment, Conductor facilitates version control and rolling updates without disrupting your production runs.

Migrating to microservices with Conductor

When you migrate from a monolith to microservices with Conductor as your platform, refactoring is as straightforward as injecting Conductor’s SDK into your application and annotating your endpoints with @WorkerTask. This prepares individual tasks for eventual refactoring into a microservice without disrupting the overall application functionality.

Annotate your functions to turn it into a task worker.

From there, you can gradually remove each task from the monolith and set it up as an individual microservice. Using Conductor’s task-to-domain, differentiate the worker pools and slowly transition traffic from the monolith to microservices.

Use different domains to split the worker pools.

Conductor SDKs are available in Java, Python, JavaScript, Go, CSharp, and Clojure, allowing you to write services in your preferred language.

With a platform like Orkes Conductor, you can streamline microservices development and rise above any challenges associated with it:

Simplify migration: Gradually transition monolithic applications to microservices without disrupting operations.
Safeguard from failures: Ensure robust execution of distributed processes with built-in fault tolerance, failure flows, and retry mechanisms.
Gain full visibility: Debug in minutes rather than days, with comprehensive monitoring and tracing capabilities.

By choosing Conductor, you're investing in a battle-tested solution that tackles the complexities of microservice-based applications.

Debugging Distributed Systems

livw — Mon, 09 Sep 2024 10:43:17 +0000

Distributed systems have many advantages: horizontal scalability, increased fault tolerance, and modular design, to name a few. On the flip side, distributed systems are also much harder to debug compared to centralized systems. In this article, let’s explore the challenges of debugging distributed systems and some strategies to make it easier.

Three tiers of complexity in debugging

Bugs are errors in the program that cause unexpected behavior. This can stem from unit-level errors in syntax, logic, or calculation, or system-level issues like integration errors or compatibility issues. In general, there are three tiers of complexity in debugging:

Debugging non-concurrent programs.
Debugging concurrent programs.
Debugging distributed programs.

Concurrent programs are more complex to debug than non-concurrent one, as there are multiple threads of execution to account for.

Non-concurrent programs run a single thread of execution, which makes debugging relatively straightforward. At a basic level, most bugs can be caught by attaching a debugger to the program and enlisting fundamental debugging methods like breakpoints and stepping through code, print statements and inspecting the variables during program execution, and examining the call stack and memory at runtime.

Concurrent programs run multiple threads of execution using shared resources. In a concurrent program for example, one thread could be used to display content, another to load animations, and yet another for copying content, and so on. These threads do not have a guaranteed order of execution, yet some threads may depend on another thread’s output before it can execute.

Because the behavior of one thread can affect other threads, concurrency introduces an additional layer of complexity. Bugs can arise due to race conditions, deadlocks, or other synchronization issues. Such issues are more difficult to detect and reproduce as they are non-deterministic and occur based on specific real-time conditions like execution order and timing.

To debug a concurrent program, execution logs, thread dumps, and stack traces play a vital role in figuring out the behavior and interactions between threads. You can leverage a debugger to identify and reproduce bugs by recording and replaying the program execution. However, with concurrent programs, strategies like using breakpoints and logging should be implemented carefully to avoid interfering with the timing and synchronization of multiple threads.

Challenges of debugging distributed programs

Distributed programs consist of multiple connected nodes that communicate with each other over a network to complete a goal, like file storage, streaming, user management, or payment processing. Each node runs its own thread or threads of execution, and each node has its own memory, resources, and execution context. As such, for distributed programs, even if every node is non-concurrent, the entire system is ultimately concurrent.

At this level of complexity, debugging becomes much trickier. More than just debugging code, developers must also understand and account for interactions, timing, and emergent behaviors in order to identify the root cause.

Limited observability

One key constraint for debugging distributed systems is limited observability at a global scale. Many traditional debugging strategies are typically confined to one node or machine at the execution level: using a debugger, inspecting the logs, or profiling the execution. Given that the entire system’s state is distributed across multiple nodes, these strategies often only provide a partial view of what is happening. Without a global view of the entire system, it is difficult to piece together the interactions and circumstances that have led to the error and to test your hypotheses.
Reproducibility

Concurrency bugs or bugs arising from the interactions between nodes are also much harder to reproduce and test due to their non-deterministic nature. The same input can lead to different behavior due to timing, network conditions, or load distribution. Such bugs are often timing-dependent or related to race conditions and may not replicate if the circumstances are altered. And when a system’s behavior also depends on environmental factors like the network set-up, the underlying infrastructure, or the scale of data load, bugs encountered in a distributed system become much harder to replicate in a local environment.
The observer effect

Finally, if not used prudently, many debugging strategies or tools can alter the behavior of the distributed system, masking the bug that you’re hunting down or creating new errors in the process. Debuggers also often slow down your program, which may impact time-sensitive or asynchronous interactions. This observer effect may make it harder to identify and isolate the root causes of the problem.

Strategies for debugging distributed programs

Given its tricky nature, what strategies can we use to debug a distributed program? It boils down to three key aspects of debugging: understanding what your code is trying to do, the execution, and how the code resulted in the execution.

Understanding the code:

For distributed programs in particular, rather than just understanding the code, it is vital to also develop an understanding of the overall flow of your distributed system and its interlocking dependencies.

What is the execution path in a success scenario? And what are the areas where errors can arise? For example, in a payment flow, how should the system handle a duplicate transaction, or what happens if the payment is authorized seconds before the card get frozen by the bank? Often, good design and thinking through the program flow are one of the first safeguards against bugs both pre-production and in production.

Understanding the execution:

Since distributed programs have logs distributed across different nodes, it would take much more time to pinpoint the relevant logs for debugging. To cut down the time taken to locate the logs, it would be useful to know the program execution flow and which nodes are involved in the error ahead of time.

This is where distributed tracing comes in handy for distributed debugging. These tracers follow a request as it moves across different nodes and services, capturing context about the interactions, behaviors, and errors encountered along the way, including where the errors originated and how they propagate through the system. Tools such as OpenTracing or OpenTelemetry go a long way for context propagation.

Understanding how the code resulted in the execution:

When using debuggers for distributed systems, you should take care to use them on the appropriate node to avoid masking the bug or creating other unwanted errors and timeouts. Remote debuggers can be used for remote nodes, and time-travel debugging can be used to reproduce hard-to-find bugs.

In cases where a debugger creates more problems than it solves, bisecting is a powerful alternative for locating the source of the error. Bisecting is a technique that uses binary search to quickly pinpoint the commit that introduced a bug to your code repository. At each point of the binary search, you will test the build for the bug before marking the commit as good or bad. While testing at high volumes may be time-consuming, with some automation, bisecting becomes a productive way to locate the source of the bug.

Putting it in practice: How to debug a distributed system

Let’s hear from our Orkes developers, the team behind the open-source orchestration engine Conductor (also a distributed system), on how we debug errors in a distributed system.

Even though distributed systems are highly complex with tons of moving parts involved, debugging such systems does not have to be too complicated. The key lies in narrowing the debugging scope: the vast majority of the bugs we deal with usually can be reduced to a single component, with a limited amount of context needed for the bug to occur.

Context propagation lays the groundwork for reducing the debugging scope. This can be as simple as carrying a unique ID across logs during executions, or using tools like OpenTracing or OpenTelemetry. With the information, we can find the exact section where the problem occurred and then isolate it.

Whenever Conductor users reach out about a bug, the first thing we do is to get the exact setup they have. This step is critical for confirming where exactly the bug is coming from, given that different version builds may have different code, and the code containing the bug may have already been changed.

To try to reproduce the bug, we get more details such as the logs, the workflow execution JSON, or any SDK code that the user ran. If successfully reproduced, we bisect across Conductor releases to find the exact change that caused the bug. From there, it’s an iteration of a discussion, a fix, and integration tests to prevent future regressions.

If there are difficulties reproducing the bug even with all the available context, we go back to the basics: analyzing the source code areas which could negatively impact execution. Sometimes, the bug is related to concurrency and multithreading, which requires careful analysis of the code and double-checking your own assumptions. As one of our developers says: "If you have a seemingly impossible bug that you cannot reproduce consistently, then it is almost always a race condition."

There is no silver bullet to debugging. When it comes to a thorny bug that relates to concurrency, we run through the same process of reproducing the bug but at larger volumes, and carefully thinking through the code. Ultimately, the best debugging tools are those that facilitate the process of thinking through the code.

Debugging with Orkes Conductor

Conductor is an open-source orchestration platform used widely in many mission-critical applications for LLM chaining, event handling, and microservice orchestration.

Orchestration is a powerful way to coordinate services and components in a distributed system. Beyond simplifying distributed development, orchestration tools like Conductor provide system observability for troubleshooting and debugging. In other words, it cuts out all the time spent on tracking down the source of the error so that developers get straight to thinking through the code and flow, slashing debugging time from hours or days to mere minutes.

Trace with ease

With Orkes Conductor’s workflow visualization and ID tracking, developers can look up the state of their workflows and tasks, instantly identify the source of failure, and inspect the execution details for execution time, input/output values, or worker logs. No need to spend time tracking down where the error occurred, manually printing variables, or hunting down logs.

Orkes Conductor is an orchestration platform with in-built capabilities for debugging.

Let’s take a look at an example distributed program flow of a data indexer. This program loops through a series of tasks, namely: a task that retrieves data from a third-party API, and a task that indexes the data into a vector database using an embedding model.

When a workflow fails, developers can use Conductor to retrieve or visualize the exact task where the error occurred, which instantly reduces the debugging scope. Conductor tracks the task execution details, which means you can then look up the source of the failure. In the case of the data indexer, the workflow failed during the data retrieval task due to a 429 (Too Many Requests) error.

Introspecting the execution details of a failed data indexer workflow.

With the bug easily identified, you can spend your time coming up with the solution instead—in this case, retrying the task again later or upgrading the third-party API limits. Since the data indexer workflow loops through its tasks iteratively to retrieve as much data as possible in a single run, you can also retrieve the execution details for each iteration. Here, you get to see that the flow terminated on the 17th iteration. These information help with making an informed decision on how to best resolve the problem.

Once resolved, Conductor enables developers to quickly recover from failure with programmatic calls to rerun, retry, or upgrade workflows.

Monitor at scale

Orkes Conductor also provides a rich metrics dashboard to monitor your executions. Developers can get aggregated analytics for workflow execution volume and rate, system health, or latencies; configure alerts for critical thresholds; or export the data to a central metrics repository like Datadog or Prometheus. These metrics can help in making informed decisions on scaling workers, optimizing workflow speeds, and so on.

Orkes Conductor metrics dashboard.

—--

Evolution of Software Architecture: From Mainframes and Monoliths to Distributed Computing

livw — Tue, 06 Aug 2024 10:36:13 +0000

Software architecture—the blueprint of our digital world—has evolved tremendously since the dawn of the computer age in the mid-20th century. The early 1960s and 70s were dominated by mainframes and monolithic software. Today, the digital landscape looks entirely different, running on a distributed web of cloud computing, API connectivity, AI algorithms, microservices, and orchestration platforming.

How has software architecture evolved over the years? As we revisit the technological progress through the decades, we will see how changes in business needs, market trends, and engineering practices have impacted software architecture.

Mainframes and monoliths: ~1940s

The first computers were mainframe computers—large, powerful hardware machines that took up an entire room. Mainframes originated as standalone machines that could run complex computing tasks. Prior to the 1970s, instructions to mainframe computers were sent via punchcards or magnetic tape, and the output received via printers.

Prior to the 1970s, data centers held mainframes that accepted instructions through punchcards or magnetic tape. Credit: unknown.

The first mainframes, Harvard Mark I and ENIAC, were developed for military and research purposes in the 1930s-1940s. In 1948, the first commercial mainframe was introduced to the world: UNIVAC. The following decades saw widespread adoption by banking, financial, and airline sectors for the mainframe’s outstanding ability in batch processing transactional data. Many of these systems still remain in operation today.

Mainframe applications were programmed in COBOL (Common Business-Oriented Language), which remains popular amongst mainframers even today. The software architecture for these applications was monolithic, which meant a single, unified codebase that contained the data schema, application methods, database connections, presentation logic, and so on without modularization. To update any of these components, developers would have to access the entire codebase and redeploy it in a single package.

Monolithic architecture.

Networks and client-server: ~1950s

Networks connect and facilitate communication between computers—mainframe to terminal, mainframe to mainframe, and later client to server. The development of network technology from 1958 onwards enabled mainframes to be connected electronically, transforming them into multi-user computers that were connected to multiple terminals. Instead of punchcards and printers, people could use a monitor, keyboard, and a command-line interface (CLI) to send and receive data.

Technological limitations restricted the first few connected computer systems. Multiplex mainframes, for example, could only be used locally as the cable length meant that the terminals had to be positioned very close to the mainframe. These early data centers contained not just computers, but dozens of humans sending jobs to the mainframe.

ARPANET was the first public, wide-area computer network, going live in 1969. It communicated data using packet switching, which went on to serve as the foundation for modern-day Internet as we know it.

Network technology popularized the client-server structure in the 1980s, where an application is divided into a server software and a client software that communicates over a computer network. This structure is familiar to us today: a client, typically a desktop computer, remotely makes a request to a server, which returns a response. With the distribution of computing resources, the server handled data processing and retrieval while the client dealt with the presentation of the data.

Client-server architecture.

The first client-server applications were mail services, web servers, and other desktop applications with online capabilities. Today, client-server has become the standard paradigm for most applications, and more broadly encompasses a general model of a service requester and a service provider.

Despite the two-tier separation, many such applications were still built in a monolithic fashion. All application features resided in a single codebase, tightly coupled, and shared access to a single database.

WWW, websites, and webapps: ~1980s

1983 marked the year of the Internet. The Internet was a global system of computer networks that used the TCP/IP protocol to facilitate communication between devices and applications. This was the backbone for FTP programs, SSH systems, and of course, the World Wide Web (WWW).

Although the Internet and WWW are used interchangeably today, the WWW was invented almost a decade later, in 1990. The WWW is an information system—a web of HTML content connected by links—shared and organized over the Internet using the HTTP protocol. It was a revolutionary way of storing information such that it could be accessed globally, paving the road for the era of websites and web programming.

In the early days, websites were static pages that displayed data from the web server. The introduction of the Common Gateway Interface (CGI) in 1993 brought web interactivity to the fore, kickstarting the prospects of web applications.

Fledging web interactivity took off with the invention of JavaScript in 1995, which brought scripting logic onto the client side. JavaScript quickly became the new standard for web programming, and web servers could more easily deliver dynamic, interactive content. These were the early forums, bulletin boards, and web forms.

The invention of the web and its latent possibilities soon kicked off the next wave of application development. Instead of building a dedicated client for your application, you could simply build a website to be hosted on the web.

Service-oriented architecture and web services: ~1990s

As application development grew, a monolithic codebase became more unwieldy to manage, and it became clear that capabilities or data housed in one system could be reused in another.

To address these challenges, modularization became a topic of discussion. In the 1990s, the server side was further split into two tiers: the application server and the database. The application server stored all the application and business logic, while the database server stored the data records, which reduced latency at high processing volumes.

Around the same time, service-oriented architecture (SOA) emerged as an architectural pattern where software capabilities are designed as individual services that can be used with any system as long as the system followed its usage specification. SOA encouraged a move towards developing enterprise applications as loosely coupled services that interact through a communication protocol over a network, a pattern that has remained dominant today.

Under SOA, a shopping app would contain multiple services: one for inventory tracking, another for order processing, and yet another for user authentication. Unlike a microservice-based application, services in an SOA pattern still share the same database, through the application layer.

Service-oriented architecture (SOA).

With SOA, came the need to define set standards and protocols for how these services interacted with all sorts of clients. DCOM and CORBA were some non-web-based standards soon overshadowed by web-based ones like SOAP and REST APIs. SOA offered a way for services from different providers to be integrated into one application or for the same services to be utilized on different clients, like a web portal or a dedicated desktop interface.

Virtual machines and cloud computing: ~2000s

SOA set the stage for the move from traditional desktop applications to a new mode of software applications—SaaS—but it was the invention of virtual machines and cloud computing that further spurred the explosion of SaaS products in the coming decades.

Virtual machines, enabled by the hypervisor, are computer systems that exist on the software layer instead of as a physical machine. Using virtual machines, it became much easier to create, update, and destroy multiple machines that run different operating systems on a single computer, maximizing resource allocation and utilization for application development.

Virtual machine infrastructure.

Although machine virtualization has existed since the 1960s, it only came into mainstream use in the 2ooos with the rapid succession of releases by Linux, Microsoft, and VMware. This was the period when companies like Amazon identified the lucrative opportunity that virtualization offered: managed cloud computing services. Physical bare metal machines are expensive and difficult to manage, a limiting factor when companies want to scale. With cloud computing services like Amazon EC2, companies could rent virtual machines for processing power and scale as required.

Growing companies like Facebook and Netflix could truly focus on building out their software capabilities without needing to maintain the underlying hardware of bare metal machines and data centers. The technical overhead to get started became much lower, accelerating the next wave of startups and digital-native businesses in the coming decades. In turn, this unlocked the next step in distributed computing and software architecture: microservices.

APIs, containers, and the rise of microservices: ~2010s

The 2010s were the culmination of multiple trends towards distributed computing. Fueled by the need for third-party access to their services, the first few commercial APIs were launched in 2000 by Salesforce and eBay, enabling their partners or customers to integrate features onto their own sites or applications. From Twitter and Google Maps to Stripe, Twilio, and now OpenAI, the API economy has ballooned since, powering integrated features across the web.

In the same vein, microservices took off when scaling companies like Amazon and Netflix needed to speed up and streamline the development cycle, which was slowed down by a monolithic codebase. By splitting up an application into individual microservices, each with its own database, teams could independently update and deploy them, leading to faster releases and improvements.

Microservice-based infrastructure.

While there are many ways to package and deploy microservices—on a physical or virtual machine—the growth in microservice-based architecture was supported by the emergence of containers. Like virtual machines, containers were an abstraction layer conceptualized in the 1970s, but only launched into enterprise recognition after 2013 when Docker was made open-source.

Compared to virtual machines, containers provide a greater level of compartmentalization, so multiple instances and versions of the same application can run on the same operating system. All the components needed to run an application—code, run time, libraries, dependencies, and system tools—are stored within the container, offering greater portability and scalability for deploying applications or microservices.

Containers.

With a patchwork of native or third-party services, databases, and so on, modern application development now requires a robust way to architect and integrate these different components. This brings us to the software architecture of today: orchestration and event systems.

Orchestration, eventing systems, and solving for distributed interdependency: today

With a distributed model of computing—microservices, APIs, and SOA to a degree—comes a pertinent problem in software architecture: how will these separate services, databases, and components communicate and interact with each other to flow cohesively?

There are two main approaches to resolving the issue of interdependency between distributed services: event-driven architecture and orchestration.

Event-driven architecture

In an event-driven architecture, services push data into a service bus or event pipeline for any other connected service to read and execute if necessary. The overall system responds to events or state changes without keeping track of the impact of individual events on other events, thus reducing interdependency between each service.

While the concept of a service bus has been around since the emergence of SOA, the trend towards microservices has brought it even further to the fore, with the likes of Kafka and Amazon SQS. An event-driven system enables real-time updates and improved system responsiveness while unlocking increased throughput in parallel processing. This has powered systems with fast-changing updates, such as ride-hailing or airline transactions.

Event-driven architecture.

However, event streams do not provide insight into the overall state of the system and the progress of a process across distributed services. The lack of state tracking and visibility poses significant challenges when it comes to debugging and troubleshooting, as well as implementing error handling and resilience mechanisms. Designing an event stream that can properly handle sequential processes when there is no in-built chronology may end up overly complicated, requiring careful consideration of the event flow, routing, and handling.

Orchestration

Orchestration offers another viable solution to resolving the problem of microservice interdependency and even the issues encountered in event-driven architecture. In orchestration, a central orchestrator schedules each task or microservice based on a predefined flow, only proceeding to the next task in sequence when the previous one has been completed successfully. Unlike event streams, the orchestrator tracks the overall progress across each service, empowering developers to easily trace and debug errors and implement failure compensation.

Orchestration.

The orchestration layer forms an integral level of abstraction that coordinates separate services, databases, event streams, LLMs, and other components into a concerted process. From ease of integration to ease of tracking and troubleshooting, orchestration empowers developers to architect applications across the entire development lifecycle, seizing the world of software development with the likes of Orkes Conductor and Airflow. The durability of orchestration has streamlined many complex workflows, such as automating infrastructure upgrades or processing shipment orders over long periods of time.

We leave the history of software architecture at this juncture: orchestration as an architectural layer that unlocks the next step in distributed computing.

Summary

Throughout the past century, technology has advanced in leaps and bounds from mainframes and networks to virtual machines, containers, and genAI capabilities today.

The tech stack over the years.

Looking ahead, software architecture will continue to evolve, shaped by advances in technology and the changing needs of businesses. For software architects and developers, it is more important than ever to adopt and adapt to better paradigms without losing sight of the core principles of good design. Ultimately, the best software architecture is one that best suits your business and product requirements.

Using Orkes Conductor

Leveraging advanced workflow orchestration platforms like Orkes Conductor unlocks developer productivity in the world of distributed computing. Used widely in many mission-critical applications for microservice orchestration, event handling, and LLM chaining, Orkes Conductor empowers teams to build resilient, scalable systems with ease:

Visual workflow editor— Build and edit workflows visually, using dozens of integrations, custom tasks, and in-built system tasks and operators for APIs, webhooks, databases, and LLMs.
Elastic capacity—Run millions of concurrent workflows with minimal latency atop Conductor’s resilient infrastructure, created for durability, speed, and redundancy.
Failure handling—Get native support for rate limits, retry policies, timeouts, and more.
Versioning—Safely version your workflows with no disruption to production runs.
Introspection and metrics—Inspect workflow performance and logs for testing and debugging, and get aggregated insights into throughput and other metrics.
Enterprise-grade security—Secure access with SSO, RBAC, and secret variables.

RAG Explained | Using Retrieval-Augmented Generation to Build Semantic Search

livw — Thu, 13 Jun 2024 08:59:48 +0000

Large language models (LLMs) have captured the public sphere of imagination in the past few years since OpenAI first launched ChatGPT to the world in late 2022. After the initial fascination amongst the public, businesses followed suit to find use cases where they could potentially deploy LLMs.

With more and more LLMs released as open source and deployable as on-premise private models, it became possible for organizations to train, fine-tune, or supplement models with private data. RAG (retrieval-augmented generation) is one such technique for customizing an LLM, serving as a viable approach for businesses to use LLMs without the high costs and specialized skills involved in building a custom model from scratch.

What is retrieval-augmented generation?

RAG (retrieval-augmented generation) is a technique that improves the accuracy of an LLM (large language model) output with pre-fetched data from external sources. With RAG, the model references a separate database from its training data in real-time before generating a response.

RAG extends the general capabilities of LLMs into a specific domain without the need to train a custom model from scratch. This approach enables general-purpose LLMs to provide more useful, relevant, and accurate answers in highly-specialized or private contexts, such as an organization’s internal knowledge base. For most use cases, RAG provides a similar result as training custom models but at a fraction of the required cost and resources.

How does retrieval-augmented generation work?

RAG involves using general-purpose LLMs as-is without special training or fine-tuning to serve answers based on domain-specific knowledge. This is achieved using a two-part process.

First, the data is chunked and transformed into embeddings, which are vector representations of the data. These embeddings are then indexed into a vector database with the help of an AI algorithm known as embedding models.

Once the data is populated in the index, natural language queries can be performed on the index using the same embedding model to yield relevant chunks of information. These chunks then get passed to the LLM as context, along with guardrails and prompts on how to respond given the context.

RAG (retrieval-augmented generation) is a two-part AI technique that involves indexing data into a vector database and searching the database to retrieve relevant information.

Why use retrieval-augmented generation?

RAG offers several strategic advantages when implementing generative AI capabilities:

Minimize inaccuracies

Using a RAG-based LLM can help reduce hallucinations (plausible yet completely false information) or inaccuracies in the model’s answers. By providing access to additional information, RAG enables relevant context to be added to the LLM prompt, thus leveraging the power of in-context learning (ICL) to improve the reliability of the model’s answers.

Access to latest information

With access to a continuously updated external database, the LLM can provide the latest information in news, social media, research, and other sources. RAG ensures that the LLM responses are up-to-date, relevant, and credible, even if the model’s training data does not contain the latest information.

Cost-effective, scalable, and flexible

RAG requires much less time and specialized skills, tooling, or infrastructure to obtain a production-ready LLM. Furthermore, by changing the data source or updating the database, the LLM can be efficiently modified without any retraining, making RAG an ideal approach at scale.

Since RAG makes use of general-purpose LLMs, the model is decoupled from the domain, enabling developers to switch up the model at will. Compared to a custom pre-trained model, RAG provides instant, low-cost upgrades from one LLM to another.

Highly inspectable architecture

RAG offers a highly inspectable architecture, so developers can examine the user input, the retrieved context, and the LLM response to identify any discrepancies. With this ease of visibility, RAG-powered LLMs can also be instructed to provide sources in their responses, establishing more credibility and transparency with users.

How to use retrieval-augmented generation?

RAG can be used for various knowledge-intensive tasks:

Question-answering systems
Knowledge base search engine
Document retrieval for research
Recommendation systems
Chatbots with real-time data

RAG (retrieval-augmented generation) is useful for many knowledge-retrieval processes.

Building a retrieval-augmented generation system

While the barriers to entry into RAG are much lower, it still requires an understanding of LLM concepts, as well as trained developers and engineers who can build data pipelines and integrate the query toolchain into the required services for consumption.

Using workflow orchestration as a means to build RAG-based applications levels the playing field to that of anyone who can string together API calls to form a business process. The two-part process described above can be built as two workflows to create a RAG-based application. Let's build a financial news analysis platform in this example.

Orchestrating RAG using Orkes Conductor

Orkes Conductor streamlines the process of building LLM-powered applications by orchestrating the interaction between distributed components so that you don’t have to write the plumbing or infrastructure code for it. In this case, a RAG system requires orchestration between four key components:

Aggregating data from a data source;
Indexing and retrieving the data in a vector database
Using an embedding model; and
Integrating and calling the LLM to respond to a search query.

Let's build out the workflows to orchestrate the interaction between these components.

Part 1: Indexing the data

The first part of creating a RAG system is to load, clean, and index the data. This process can be accomplished with a Conductor workflow. Let’s build a data-indexer workflow.

Step 1: Get the data

Choose a data source for your RAG system. The data can come from anywhere — a document repository, database, or API — and Conductor offers a variety of tasks that can pull data from any source.

For our financial news analysis platform, the FMP Articles API will serve as the data source. To call the API, get the API access keys and create an HTTP task in your Conductor workflow. Configure the endpoint method, URL, and other settings, and the task will retrieve data through the API whenever the workflow is executed.

Use the HTTP task in Orkes Conductor to call an API.

Step 2: Transform the data

Before the data gets indexed to the vector database, the API payload should be transformed, cleaned, and chunked so that the embedding model can ingest it.

Developers can write Conductor workers to transform the data to create chunks. Conductor workers are powerful, language-agnostic functions that can be written in any language and use well-known libraries such as NumPy, pandas, and so on for advanced data transformation and cleaning.

In our example, we will use a JSON JQ Transform Task as a simple demonstration of how to transform the data. We only need the article title and content from the FMP Articles API for our financial news analysis platform. Each article must be stored in separate chunks in the required payload format for indexing.

API payload vs the transformed data. Only the relevant data are retained.

Step 3: Index the data into a vector database

The cleaned data is now ready to be indexed into a vector database, such as Pinecone, Weaviate, MongoDB, and more. Use the LLM Index Text Task in your Conductor workflow to add one data chunk into the vector space. A dynamic fork can be used to execute multiple LLM Index Text Tasks in parallel so that multiple chunks can be added at once.

Use the LLM Index Text task in Orkes Conductor to store data into a vector database.

The LLM Index Test Task is one of the many LLM tasks provided in Orkes Conductor to simplify building LLM-powered applications.

Repeat

To build out the vector database, iterate through the three steps — extract, transform, load — until the desired dataset size is reached. The iterative loop can be built using a Do While operator task in Conductor.

Here is the full data-indexer workflow.

data-indexer workflow in Orkes Conductor.

Part 2: Retrieving data for semantic search

Once the vector database is ready, it can be deployed for production — in this case, for financial news analysis. This is where data is retrieved from the vector database to serve as context for the LLM, so that it can formulate a more accurate response. For this second part, let’s build a semantic-search workflow that can be used in an application.

Step 1: Retrieve relevant data from vector database

In a new workflow, add the LLM Search Index Task — one of the many LLM tasks provided in Orkes Conductor to simplify building LLM-powered applications. This task takes in a user query and returns the relevant context chunks that match the most closely.

The LLM Search Index Text takes in a user query and returns the relevant context chunks from the vector database.

Step 2: Formulate an answer

With the retrieved context, call an LLM of your choice to generate the response to the user query. Use the LLM Text Complete Task in Orkes Conductor to accomplish this step. The LLM will ingest the user query along with the context.

Use the LLM Text Complete task in Orkes Conductor to prompt a selected LLM for a response.

Guardrails can be set up in Orkes Conductor to optimize and constrain the LLM response, such as by adjusting the temperature, topP, or maxTokens.

Use Orkes Conductor’s AI Prompt studio to create a prompt template for the LLM to follow in the LLM Text Complete Task.

Example prompt template

Answer the question based on the context provided.

Context: "${context}"

Question: "${question}"

Provide just the answer without repeating the question or mentioning the context.

Here is the full semantic-search workflow.

semantic-search workflow in Orkes Conductor.

Use the workflow in your application

Once the semantic-search workflow is ready, you can use it in your application project to build a semantic search engine or chatbot. To build your application, leverage the Conductor SDKs, available in popular languages like Python, Java, and Golang, and call our APIs to trigger the workflow in your application.

The RAG-based financial news analysis platform looks like this:

The RAG-based financial news analysis platform in action, with a Conductor workflow powering it.

Whenever a user enters a query, a semantic-search workflow runs in the background to provide the answer.

If the vector database needs to be updated on the backend, the data-indexer workflow can be triggered, or even scheduled at regular intervals for automatic updates.

While the financial news analysis platform is a simple variant of a RAG system, developers can use Orkes Conductor to quickly develop and debug their own RAG systems of varying complexities.

Summary

Building semantic search using RAG can be much more achievable than most people think. By applying orchestration using a platform like Orkes Conductor, the development and operational effort need not involve complicated tooling, infrastructure, skill sets and other resources. This translates to a highly efficient go-to-market process that can be rapidly iterated over time to optimize the results and value derived from such AI capabilities in any modern business.

—

Conductor is an open-source orchestration platform used widely in many mission-critical applications. Orkes Cloud is a fully managed and hosted Conductor service that can scale seamlessly according to your needs. Try it out with our free Developer Playground.

Emerging Tech Trends 2024: The Latest Developments in AI, API, and Automation

livw — Fri, 17 May 2024 11:05:28 +0000

Technology has always been the lynchpin in driving progress — from the first ships built to the first industrial revolution to the computer age. The time between each successive technological wave has been shortening, and in the past few years, we have seen explosive growth in Artificial Intelligence (AI) and its capabilities.

While AI has been the epicenter of most tech buzz, it is just one tailwind in the wider trajectory toward greater automation in computer-based industries. In this article, we will explore the latest tech trends in AI, APIs, and automation that can impact your business. Find out how AI serves as the frontrunner and API as the technical bedrock that accelerates ongoing tech trends, with automation as the overarching motivation for the latest developments in the tech and business landscape.

GenAI has unlocked new ways to work

The viral release of OpenAI’s ChatGPT3.5 in 2022 sparked a huge wave of excitement, as millions of people experimented with what it could do and what it meant for the future of work. Chatbots like ChatGPT are a form of intelligence known as Generative AI (GenAI), which is capable of producing naturalistic images, text, or even videos based on a given prompt.

At the core of GenAI are AI programs known as large language models (LLMs), which consume large amounts of training data in order to produce real-like data. Now, in 2024, ChatGPT is just one of the few thousand LLMs available for use. From enterprise offerings like Google Gemini, Amazon Bedrock, and Anthropic Claude to the 650K open-source models hosted on Hugging Face (as of May 2024), there is no shortage of choice when it comes to using GenAI as a copilot for work.

AI-augmented code

Unsurprisingly, the tech sector has been one of the first few to adopt GenAI in work. Many companies, big and small, have been using LLM chatbots like ChatGPT to produce, rewrite, and debug code, accelerating developer productivity. A survey by GitHub found that 92% of US-based developers are using AI coding tools for work, and 70% feel the benefits of using such tools.

Some early entrants to code completion tools are Tabnine, and more recently, GitHub Copilot and Google’s CodeGemma.

But beyond using GenAI for writing code, there are many more possibilities for AI-powered software development. At the keynote for the 2024 Apidays Singapore conference, Manjunath Bhat (VP Analyst - Gartner) remarked that AI could also play a role in other high-impact avenues, like suggesting what should be built or explaining legacy or indecipherable code.

Thus far, there is a fast-expanding array of AI tools for automating software tests, generating documentation, and answering developer questions.

A wide variety of use cases for AI in software development.

Specialized models, plug-ins, and more

Today, the average person may ask ChatGPT to summarize an essay, provide travel itineraries, or solve logic puzzles. Such chatbots are based on general-purpose models that can handle a gamut of tasks across domains.

While generic models have the advantage of being able to handle — with varying success — any task right away with no training required, they often fall short when the user requires a more accurate or complex response. Issues like hallucination, bias, or inaccuracies are some of the biggest challenges to implementing and scaling AI in a highly specialized business context.

Domain-specific models

A variety of domain-specific and multilingual models have entered the scene to address the limitations of generic models. These specialized models are trained with domain-specific datasets, allowing the model to learn and perform far better on specialized tasks.

Google’s Med-PaLM 2 model was trained with curated medical databases, becoming the first LLM to perform at 85% accuracy on medical examinations. Such domain-specific LLMs are the first crest in the next wave of AI development, where LLMs can be safely deployed in real-world contexts like assisting with tuberculosis diagnoses or analyzing past legal cases.

Already, there are LLMs like BloombergGPT and FinGPT for finance, ChatLAW for law, and ClimateBERT for climate and environment.

Plug-ins, API connectivity

Besides specialized models, another way to power up general models is through plug-ins or connectivity with APIs.

LLM plug-ins extend the model’s capabilities. Through API calls, these plugins can provide access to external databases or execute third-party tasks like making a hotel reservation. That’s how plug-ins enable LLMs to handle more complex requests like acting as a personal shopper or creating a financial report based on the latest data.

While OpenAI has rolled back ChatGPT’s plug-ins after a year of service, the game is still on to get LLMs hooked up with APIs. Ongoing projects like Gorilla (UC Berkeley) and RestGPT (Peking University) are working to build LLM-powered systems that can carry out real-world actions based on a command, like “create a Spotify playlist” or “book a flight to Paris”.

In these frameworks, LLMs act as a switchboard operator who knows the right APIs to call when prompted, then formulates the API request and parses the response for the user. API connectivity promises a new breakthrough for LLMs: beyond just a conversational partner, LLMs can act and carry out tasks in the real world, bringing us one step closer to natural language interfaces.

API connectivity can allow LLMs to act on the world, unlocking new avenues in natural language interfaces.

API as a Product, API as tech accelerant

If the 2010s were the decade for the rise of the software-as-a-service (SaaS) model, then the 2020s are all about API as a Product (AaaP). Under a SaaS model, businesses offer their unique services and strengths through web-based applications, like Gmail for mail services or Zoom for video conferencing.

However, in recent years, applications are gradually becoming more modular and built using a microservice-based infrastructure. This means that application functionalities like payment, notifications, or even login credentials are added using APIs. Because APIs expose the capabilities of a service in a programmatic manner, it is much easier for developers to build products without having to code everything from scratch.

This change in software development heralds an API-based economy, where products and services are offered and accessed through APIs.

API as a Product

Stripe, Sendgrid, and Twilio—these are companies that have grown tremendously by selling third-party API functionalities to other enterprises. More and more API offerings are expected to crop up in the coming decade, especially for industry-specific APIs like payroll, open banking, government, and so on.

As the API market grows, Kong's 2023 API Impact Report estimates that APIs will contribute $14.2 trillion to the global economy by 2027, up from $10.9 trillion in 2023.

Economic impact of APIs is expected to jump $3.3 trillion dollars by 2027.

Along with the economic boom, there has been an explosion of tooling, standards, and platforms to support the growth of API as a Product. With so many APIs entering the market, it has become more difficult to discover and integrate each and every API into a given application. Two key trends stand out:

API portals and marketplaces—Provide a single point of collaboration where consumers can easily try, observe, use, and subscribe to API products.
Unified APIs—Provide a single point of access to multiple APIs, so users only need to integrate one API to obtain the capabilities of multiple APIs.

API as tech accelerant

An API-first economy espouses an interface-first approach, which means exposing capabilities and hiding complexities. This hallmark of APIs has made it the foundation for the rapid growth of many tech trends, the most prominent of which is AI.

Since most LLMs have only been commercially available in recent years, many have been built with an API-first approach. Many LLMs can be programmatically accessed via APIs, making it super convenient for developers to integrate AI capabilities into their applications. In other words, API access has precipitated the rapid availability of AI-powered tools and features on the market within just the past two years.

APIs have also served as the backbone for the recent proliferation of no- and low-code application builders like Bubble and Xano. With APIs, users can create applications without having to know a programming language. These tools have also led to the rise of citizen developers, enabling businesses to quickly build applications and dashboards without a large developer team.

All these emerging trends in AI and API point to the same undercurrent of opportunity: lower barriers of entry to and automation of computer-based work.

Automation in computer-based industries

Much like how the steam engine had automated blue-collar work in energy, manufacturing, and logistics, we are now entering a whole new era of automation across sectors, especially in computer-based industries.

Needless to say, AI is one of the key factors towards greater automation. Research by Goldman Sachs suggests that 18% of global work could be automated by AI, with the biggest impact felt in white-collar jobs like administration and business operations.

It is important to note that AI is unlikely to make entire job functions redundant. Rather, repetitive or low-impact tasks can be delegated to AI automation, allowing more time for high-impact or more complex work. For example, in customer support, GenAI-enabled chatbots can interact autonomously with customers to provide basic support and handle common inquiries, while human agents can handle more demanding cases.

Another key catalyst is the growing availability of automation tools, such as RPA (robotic process automation) software and orchestration platforms.

RPA (robotic process automation)—Automates high-volume, time-consuming, repetitive tasks using a bot. The bot replicates human actions on computer applications, like entering data from PDFs into a spreadsheet and emailing it, which eliminates such high-effort yet low-stakes tasks found in many business processes.
Orchestration—Automates the coordination of multiple systems, applications, and services through a centralized coordinating service. Based on a predefined workflow, the orchestrator runs through a series of tasks in order to achieve a complex or long-running business process, such as managing shipment orders, upgrading cloud infrastructure, or even deploying LLM-powered applications.

From AI to API to automation tools, we have yet again entered an age where computer-based technology will revolutionize how businesses are run. These trends promise greater productivity and connectivity across systems, allowing people to focus on high-impact work. In 2024, it is more important than ever for businesses to leverage new technology to drive value.

Scale with Conductor

Conductor is an open-source orchestration platform that automates complex or long-running processes, such as AI integration flows, microservice application flows, DevOps processes, transactional flows, and more. With Conductor, developers can build and update durable workflows without the complexities of managing system failures, dependencies, or scalability.

Orkes Cloud is a fully managed and hosted Conductor service that can scale seamlessly according to your needs. When you use Conductor via Orkes Cloud, your engineers don’t need to worry about set-up, tuning, patching, and managing high-performance Conductor clusters. Try it out with our free Developer Playground.

Durable Execution Explained — How Conductor Delivers Resilient Systems Out Of The Box

livw — Thu, 09 May 2024 16:00:00 +0000

This is part 2 of a two-part series on durable execution, what it is, why it is important, and how to pull it off. Part 2 shows how Conductor, a workflow orchestration engine, seamlessly integrates durable execution into applications. Check out part 1 for more about what durable execution is.

In the ever-evolving landscape of application architecture, durable execution and platform engineering have been gaining traction in recent years, driven by the need for resilient, scalable, and efficient systems.

Durable execution refers to a system’s ability to persist execution even in face of interruption or failure. This characteristic is especially important in distributed and/or long-running systems, where the chances of disruptions or failure increase drastically. While there are several best practices for building durable applications and systems, one of the most effective ways is to leverage stateful platforms like Orkes Conductor.

Conductor is a workflow orchestration platform that abstracts away the complexities of underlying infrastructure, enabling developers to focus on building applications. True to its name, Conductor directs and orchestrates the performance of distributed services into a dynamic application flow. Each individual player – or task – does not need to care what the other players are doing, because Conductor keeps track of what is supposed to happen at every juncture.

Its in-built state management allows for reliable recovery in case of failure or interruption. Just like a musical conductor, it empowers applications to adapt to ever-changing conditions without going offline — whether it is automatically retrying tasks, scaling up to meet traffic spikes, or integrating new services.

6 key Conductor features for durable execution.

How does Conductor enable you to build resilient, efficient, and scalable systems? Let’s take a look at what happens in the backend when you build your applications with Conductor as the main orchestration engine.

Conductor guarantees durable execution under the hood

Conductor’s secret sauce for fortifying systems with durable execution is decoupled infrastructure and redundancy. Let’s set the scene for an example workflow.

Say you have an online shop that makes and ships custom violins worldwide. The order process can take months to fulfill, from pre-ordering the violin to customizing and shipping it.

Enter the four key actors in our order workflow.

Order App—the interface where customers can pre-order violins and make payment.
Conductor Server—the central orchestration engine that directs and tracks the workflow. Conductor’s workflow execution service (WES) runs on this server and manages the task queues.
Task Workers—code units that run and complete queued tasks.
Conductor Stores—storage units that contain all workflow information, including metadata, task queues, and history.

The tech stack for how Conductor powers applications.

Typical success scenario

In the Order App, when the user clicks the Order button during the checkout procedure, a Conductor workflow for order_processing is triggered. The Order App passes the workflow input parameters, such as the order details, shipping address, and user email to the Conductor Server. In return, the Server passes back the workflow instance ID, which can be used to track the workflow progress and manage its execution.

Workflow begins upon a signal.

Based on predefined signals and parameters, the order_processing workflow will run through a series of tasks, such as a HTTP call to a payment processor, or a piece of custom functionality for invoice calculation.

In Conductor, workflows are executed on a worker-task queue architecture, where each task type – HTTP call, webhook, and so on – has its own task queue. When the workflow execution for order_processing begins, the workflow execution service (WES) begins to add the workflow’s tasks to the relevant queues. A HTTP task that calls a third-party payment processor, capture_payment, is added to the HTTP task queue. Meanwhile, calculate_invoice, a custom function, is added to a custom task queue, while notify_invoice, another third-party email service, is added to the HTTP task queue.

Based on the predefined workflow, Conductor Server adds tasks to the appropriate task queues.

While Conductor’s WES is directing and scheduling tasks to the right queue, the available Task Workers are busy polling for tasks to do. Although there are three tasks queued, the first task, capture_payment, has to be completed first, before the next task can begin. So, when Worker A polls for a task, the Conductor Server sends capture_payment to Worker A for completion. Once Worker A has completed the task, it updates the Server about the task completion status.

The Server registers and keeps track of each task’s status. So when it receives the update from Worker A that capture_payment has been completed, it will send the next scheduled task to the next available worker.

Worker-task queue architecture, where workers poll the server for tasks to do, and the server assigns tasks based on the defined workflow schedule.

This set-up is how Conductor keeps track of the workflow state as one task gets completed one after another based on the predefined workflow schedule. And voilà, with Conductor’s state management, developers need not spend time building complicated infrastructure for state management. Remember the workflow instance ID that was sent when the workflow was initiated? The Order App can simply use the ID to query the Server about the workflow status at any time.

State persistence and durability

Crucially, Conductor goes beyond just state visibility. It’s built to withstand and recover from failures no matter how long the workflow runs. Cue the Conductor Stores. At every point, data gets stored on distributed, high-availability clusters so that the workflow can always pick up and resume from where it last stopped – whether from a restart in a failed run or from an idle state in a long-running flow.

For example, after capture_payment, the WES reads the next task, wait_customization, and pauses the workflow to wait for the luthier to finish crafting the instrument. The process may take several months, but with the workflow execution history, pending task queues, and predefined flow of tasks, the system can easily recover from this state of idling. Once Conductor receives a signal – perhaps the luthier clicked a confirmation button on the Order App – that the violin has been made, it will send the next scheduled task in the queue to the next available worker.

Conductor resumes the workflow upon receiving a signal to proceed to the next task.

Handling failures in all shapes and sizes

Whether it’s transient failures, like services going offline briefly, or execution failures, like buggy code, or even deliberate termination, like a customer canceling an order, Conductor is equipped to handle it all.

We’ve seen a glimpse of how Conductor’s decoupled infrastructure and redundancy enable applications to run smoothly with guaranteed state visibility and persistence. But failure scenarios are where these characteristics for durability really shine through.

App server goes offline

Let’s continue with the custom violin order processing workflow. With the violin ready, the workflow proceeds to calculate_invoice, a custom functionality on the Order App. Perhaps at this moment, a blackout causes the Order App’s server to go down temporarily, which takes all the Task Workers for the calculate_invoice task offline as well. When the Conductor Server dispatches this task to be completed, there are no Workers available to complete it.

Based on the task’s retry and timeout policies, Conductor will automatically reschedule the task until the Order App’s server comes back online or until timeout occurs.

Conductor Server will automatically handle transient failures based on predefined parameters for retries, timeout, and so on.

Service hits rate limit

Once the calculate_invoice task has been completed, the next task – a HTTP call for notify_invoice – is invoked. At this point, we hit another roadblock: the HTTP service for this task has reached its rate limit. As before, Conductor automatically retries the task with exponential backoff, so that the task is guaranteed to be completed successfully.

Conductor goes offline

Conductor can be deployed in high-availability clusters to guarantee maximum uptime. Even so, in the off-chance where its workflow execution service (WES) goes down, Conductor’s decoupled infrastructure ensures that task runs are not affected. Since the task queues reside on high-availability clusters, separate from the WES, workers can continue running the tasks until completion and update the Conductor Server once it comes back online.

Introspecting workflows for debugging

Once in a while, workflows may still fail despite these automated safeguards and policies for guaranteed execution. However, Conductor makes it easy to remedy these situations. With Conductor Stores that preserve all execution history, developers can inspect what happened under the hood to troubleshoot and rectify errors before restarting the failed workflows.

Conductor enables you to look under the hood and find out where exactly your workflow failed and why it happened.

For example, say the number of custom violin orders have increased over time, and a number of order_processing workflow executions are taking too long or have timed out. With the ability to introspect, we can quickly pinpoint the problem. Perhaps the HTTP URL is outdated, or there are insufficient workers servicing a task. Armed with these logs, application developers can quickly troubleshoot and resolve these issues so that the workflows can restart without any roadblocks.

Importantly, because Conductor keeps state management and infrastructure separate from the Order App’s business logic, the developers can easily scale or upgrade the underlying infrastructure without any downtime.

Analyzing metrics to optimize performance

Over time, sufficient data will be collected to analyze the workflow performance in aggregate. Conductor comes equipped with a metrics dashboard that showcases key insights about latency, completion rate, failure rate, number of concurrent workflows, and so on.

Conductor provides a metrics dashboard for aggregate workflow performance.

These metrics can further inform decisions to optimize Conductor workflows for better performance, such as refactoring code or scaling up the infrastructure.

Refactoring workflows with no downtime

With in-built support for workflow versioning, application developers can refactor the workflow code anytime without impacting existing workflows. Once the workflow definition has been updated, new executions will run based on the latest definitions while existing workflows can be restarted to run the latest definitions. All of this, made possible with Conductor’s decoupled architecture.

Conductor has in-built versioning capabilities to allow for workflow changes without impacting existing runs.

Key features for durable execution

In summary, Conductor bolsters your application durability with these key features:

Resilient Engine—build atop decoupled infrastructure, created for durability, speed, and redundancy
Elastic Capacity—run millions of concurrent workflows with minimal latency
Failure Handling—get native support for rate limits, retry policies, timeouts, and more
Introspection—inspect your workflow performance for troubleshooting
Versioning—safely and cleanly version your workflows with no disruption to production runs
Metrics—get aggregated insights into workflow performance

Durability for any use case

As a general-purpose orchestration engine, Conductor is versatile enough for any possible use case — compliance checks in banking and finance, media encoding in entertainment, or shipping automation in logistics. Check out our case studies to discover how organizations across industries use Conductor or the following tutorials: