Kai Walter

Posted on Aug 17 • Edited on Aug 21

Dipping into Dapr Agentic Workflows

#ai #agents #workflow #dapr

Motivation

Since I first laid eyes and hands on Dapr when it was created in 2019, I have been fascinated by the potential of building distributed applications with it. The Dapr team has been working hard to make it easier to build microservices and distributed systems ever since. Workflows had been added with release 1.15 and AI agent support started showing up in spring 2025. When I learned about Dapr workflows and agents on this episode just recently, I wanted to explore how Dapr can be used to create agentic workflows.

I had been dabbling with some frameworks over the past few months - to one or the other degree:

Nothing really stuck with me. The challenge I wanted to solve:

I am out, mostly in the mornings for a walk or run, and I just want to drop a thought or a task immediately. Sometimes even complete sections of an upcoming presentation. Or rushing between meeting, the same: Just drop a voice recording and have it turned into a task or just as a note into my email inbox.

Finally, with n8n I pushed myself into a working flow with a highly curated environment. I dropped that idea of developing the flow on my own for that moment, as the value of having such a flow outweighed the learning experience.

How it works:

on my Android phone I use the paid version of Easy Voice Recorder Pro which allows to automatically upload into a predefined OneDrive folder (which is /Apps/Easy Voice Recorder Pro)
the recording is downloaded by the n8n flow on a trigger, when a file is created in that folder
before downloading, to be safe and not crash the transcription unnecessarily, the flow filters on audio/x-wav or audio/mpeg MIME types
additionally, the flow downloads a prompt text file from OneDrive which contains the instructions for classifying the intent in the transcription; I wanted to be on OneDrive, so I can modify it easily without having to touch the flow
then transcribe using OpenAI Whisper API
with the transcription and the prompt run through a model like GPT-4.1-MINI
that classification step also has access to a simple tool - referenced in the prompt: a list of relevant person and other entity names to make the transcription more precise
based on the intent resolved then either create a task (using a webhook, as I did not want to mess around in our corporate environment) or just send an email to my corporate-self with the plain transcription
as part of housekeeping, copy the file to an archive folder and delete the original

That worked pretty well. I especially liked the capability of n8n to copy runtime data of a particular execution into the editor, which makes mapping and debugging so much easier. I moved the cloud-based flow so I could run it basically for free (download it, import it from file, rewire cloud credentials).

Enough of n8n. A nice environment to get started quickly - without a doubt.

Value proposition of Dapr Agents and Workflows for me

This is what me got spending factor 3-4 more time into a Dapr based flow:

I like the code-first approach with workflows and agents; for use cases we face in our company I additionally needed to understand what building and operating such a flow in a sustainable and scalable fashion entails
with Dapr I get resource abstraction - switch easily between state and pub/sub resource providers, e.g. from Redis to Azure Cosmos DB or Azure Service Bus, even locally from Redis to SQlite if required
with Dapr I get observability which I can hook easily into our environment
with Dapr I achieve the desired separation of concerns between the workflow and the agents; I can develop and deploy them independently
with Dapr I can mix in "classic" enterprise processing easily, I can mix languages among Dapr applications, e.g. Python for the agents and C# for the workflow
and in the end I get scalability with Dapr: it is intrinsic to Dapr that for deterministic and non-deterministic workflows activities can be operated on multiple computing nodes easily, something that other frameworks do not necessarily provide out of the box

What I wanted to do differently

As seen above I implemented a rather deterministic flow with n8n. I wanted to explore how I can use Dapr agents and workflows to create a more agentic workflow, which is more flexible and can adapt to the situation at hand - making scaling up and bringing in new components easier. In essence this means:

polling on OneDrive, downloading and transcribing the voice recording runs in a deterministic workflow
transcript is then handed to LLM-orchestrated agents which have instructions to figure out what to do with the transcription
instead of funneling all information into the flow, I want agents to make use of tools (probably MCP servers in the future) to interact with the outside world; again here I think that Dapr can shine as I easily can wire up tools with other Dapr applications, either using pub/sub or service invocation

A look into the codebase

The code can be found in my GitHub repository.

BIG FAT DISCLAIMER: This is a work in progress, I am still learning and exploring the capabilities of Dapr agents and workflows. The code is not production-ready and should be used for educational purposes only.

DISCLAIMER: Almost 95% of the code has been created with GitHub Copilot. I made this project into a "two birds, one stone" exercise as I was keen to created a larger codebase with AI support for a long time. I will share and link here learnings on my "me and my coding apprentice" journey sometime in the future. Let's just say: For me as an occasional coder, not versed in Python really, it would not have been possible to achieve that amount of function points, a measure we kids used some decades ago to measure the size of a software project, without AI support.

Top Level / Tier 1 Structure

The structure leans into structures provided by quickstart samples. Some polishing is still required, but I wanted to get the code out there to get feedback and learnings from the community.

Dapr Multi-App Run file master.yaml points to the top-level applications and entry points:

services/ui/authenticator : a small web UI that redirects into a MS Entra ID login which on callback serializes the access and refresh tokens into a Dapr state store; from there token information is picked up to authenticate for OneDrive and OpenAI API calls by the other services; basic idea is to make the login once and let the workflow processes run in the background without further interaction
services/workflow/worker : runs the main polling loop at a timed interval to kick off the workflow, and the workflows to come, with a pub/sub signal; with that I achieve some loose coupling between the workflow and the main loop (instead of using child workflows or alike)
services/workflow/worker_voice2action : defines the deterministic steps of the main Voice-2-Action workflow; schedules a new instance when receiving pub/sub event from the main worker services/workflow/worker
services/intent_orchestrator/app : bringing a LLM orchestrator for intent processing into standby, waiting for pub/sub events from services/workflow/worker_voice2action publish intent orchestrator activity
services/intent_orchestrator/agent_tasker : participating in above orchestration as a utility agent which delivers information required for the flow like the transcript or time zone information
services/intent_orchestrator/agent_office_automation : participating in above orchestration to fulfill all tasks which connect the flow to office automation, like creating tasks or sending emails
services/ui/monitor : a small console app listening to and printing the LLM orchestration broadcast messages to allow for a better understanding of the flow; this is absolutely required to fine-tune the instructions to the orchestrator and the agents

Tier 2 Elements

workflows/voicetoaction / voice2action_poll_orchestrator : orchestrating the activities to list the files on OneDrive, marking new files and handing of each single file to child workflow ...
workflows/voicetoaction / voice2action_per_file_orchestrator : ... orchestrating in sequential order: download recording, transcription, publish to intent workflow and then archive the file

Tier 3 Elements

On this level in folder activities are workflow activities defined in modules which are referenced by deterministic workflows.

Tier 4 Elements

Folder services directly contains helper services which are used by workflow activities or agents.

Other Elements

Folder components holds all Dapr resource components used by all applications. Important to note is, that state stores are segregated for their purpose: for workflow state, for agent state and for token state. This is required as these state types require different configuration for prefixing state keys and the ability to hold actors.

Folder models contains common model definitions used by the workflow elements and agents.

PRD / requirements

As stated above, I drove GitHub Copilot for the majority of work. For that, most of the time, when not falling back into old habits, I used voice2action-requirements.md PRD file to invoke feature implementation. So, most of my intentions I had with the flow are also documented there.

start.sh

This script helps me to start the process with a clean state which makes debugging various issues, especially in the agent instructions sphere much easier.

Sample Conversation

This is what the conversation between the orchestrator and the agents looks like. In this run I put audio_samples/sample-recording-3-send-email.mp3 through the system:

== APP - monitor == 2025-08-18 08:50:44 INFO [monitor] IntentOrchestrator : ## Mission Briefing
== APP - monitor ==
== APP - monitor == We have received the following task:
== APP - monitor ==
== APP - monitor == Process voice transcription from [./.work/voice/sample-recording-3-send-email.json]. From first 2 sentences extract the user's intent and plan the next steps. Treat the remaining transcription text just as a note with no further intent to consider. Explicit user intent can be: create a task. If there is no intent, just send an email with the whole transcript.
== APP - monitor ==
== APP - monitor == ### Team of Agents
== APP - monitor == - TaskPlanner: Planner (Goal: Handle and provide all kind of input information e.g. voice recording transcript and provide additional reference information which are helpful to the process.)
== APP - monitor == - OfficeAutomation: Office Assistant (Goal: Handle all jobs that require interaction with personal productivity like sending emails or creating to-do tasks.)
== APP - monitor ==
== APP - monitor == ### Execution Plan
== APP - monitor == Here is the structured approach the team will follow to accomplish the task:
== APP - monitor ==
== APP - monitor == [{'step': 1, 'description': 'Read the content from the given voice transcription file: [./.work/voice/sample-recording-3-send-email.json] and load the data for further processing.', 'status': 'not_started', 'substeps': None}, {'step': 2, 'description': "Extract the first two sentences from the transcription to analyze the user's intent.", 'status': 'not_started', 'substeps': None}, {'step': 3, 'description': 'Determine if the extracted sentences indicate an explicit user intent, such as creating a task.', 'status': 'not_started', 'substeps': None}, {'step': 4, 'description': 'Plan the execution based on the identified intent.', 'status': 'not_started', 'substeps': [{'substep': 4.1, 'description': 'If there is a task creation intent, prepare to create a task using the specified task details.', 'status': 'not_started'}, {'substep': 4.2, 'description': 'If no explicit intent is identified, prepare the entire transcription to be sent as an email.', 'status': 'not_started'}]}, {'step': 5, 'description': 'Initiate the task creation process (if applicable), ensuring details are accurately recorded in the designated task management system.', 'status': 'not_started', 'substeps': None}, {'step': 6, 'description': "Prepare the transcription text for email composition if the user's intent is to send an email.", 'status': 'not_started', 'substeps': None}, {'step': 7, 'description': 'Send the email containing the full transcription after preparing the email draft.', 'status': 'not_started', 'substeps': None}, {'step': 8, 'description': 'Verify if all intended actions (task creation or email sending) have been completed successfully.', 'status': 'not_started', 'substeps': None}]
== APP - monitor ==
== APP - monitor == 2025-08-18 08:50:45 INFO [monitor] IntentOrchestrator : Read the content from the given voice transcription file and load the data for further processing.
== APP - monitor == 2025-08-18 08:50:52 INFO [monitor] TaskPlanner : The transcription text from the file reads: "Send an email to myself to remind me to find out what 42 is about."
== APP - monitor ==
== APP - monitor == I will now proceed with extracting the user's intent from the first two sentences.
== APP - monitor == ### Step Analysis
== APP - monitor ==
== APP - monitor == - **Step 2**: Extract the first two sentences from the transcription to analyze the user's intent.
== APP - monitor ==
== APP - monitor == Since the transcription consists of a single sentence, the user's intent is already clear:
== APP - monitor ==
== APP - monitor == ### Intent Identification
== APP - monitor ==
== APP - monitor == - **Identified Intent**: The user intends to send an email to themselves as a reminder to find out "what 42 is about."
== APP - monitor ==
== APP - monitor == ### Next Steps
== APP - monitor ==
== APP - monitor == Since there is no explicit task creation request, the next steps will involve preparing the content for email composition:
== APP - monitor ==
== APP - monitor == - Prepare the transcription text for composing an email to remind the user.
== APP - monitor == - Proceed with the creation of the email draft, ensuring the accuracy of the content.
== APP - monitor ==
== APP - monitor == In order to provide accurate reference, I will gather information about the office timezone and timezone offset before the email composition. Let me proceed with this.
== APP - monitor == 2025-08-18 08:50:57 INFO [monitor] IntentOrchestrator : Prepare the transcription text for composing an email to remind the user about 'what 42 is about.' Ensure the email draft is accurately written and ready to be sent.
== APP - monitor == 2025-08-18 08:51:04 INFO [monitor] OfficeAutomation : The email reminder has been successfully composed and sent with the subject "Reminder: Find Out What 42 Is About." The body of the email includes the reminder as intended. If there's anything else you need, feel free to let me know!

Learnings

Some other points I'd like to convey:

compared to the n8n flow, where one prompt yielded a structured intent and classification, it took some calibration on my end to balance out instructions handed to the orchestrator and the agents
when refactoring agents, especially renaming or deleting them, be sure to flush or clean up the agent state store; otherwise, orchestrator will still try to involve orphaned agents
closely and repeatedly observe conversation flow to see where instructions need to be more precise or where the agent needs to be more capable
when passing file paths in task message, wrap it in something like square brackets - just separating with a blank from regular instructions caused that file path sometimes could not be resolved correctly

Open Questions

How do I handle validation errors when I send information from the conversation over to a tool and the tool expects a structure with a validation on top of it?
Is it really that the LLM orchestrator can only process one instance at a time? Can I make it multi-instance with some unique ID or do I need to put some singleton pattern in front of it like an actor?

Conclusion

For me the versatility of Dapr for such scenarios seems tangible. One can combine deterministic with non-deterministic workflows. I think particularly this gives Dapr an edge.

I need to operate it for a while. Add observability and surely more resilience. Also adding some more intents like "analyze this topic for me and send me a report" will show, whether my assumptions regarding scalability and flexibility hold up.

Top comments (2)

Kai Walter • Aug 22

Definitely check out later versions of the repository, if you're interested.
It almost took me the same amount of time to write this solution as it took me to get it really operational with Docker Compose!

Kai Walter • Aug 18

Added a sample conversation flow between orchestrator and agents.

DEV Community