DEV Community

Mathieu Ledru
Mathieu Ledru

Posted on

⚙️ Message-oriented vs. Data-oriented orchestration - from data to knowledge

For intellectual property reasons, the subject chosen for the application of this article will not be the one discussed, although it is closely related. For any further information, please contact Omer who will be happy to answer, and apologize for any potential inconvenience.

In this article, we explore two fundamental approaches to software orchestration:

  • Message-Oriented Orchestration: via Symfony Messenger synchronous respectively asynchronous
  • Data-Oriented Orchestration: via Navi for synchronous and Flow for asynchronous

The case study is based on a classic but structuring problem: text mining applied to a set of Git repositories.

For the practical demonstration, I will take the EIT tutorial from 2007/2008 carried out at the time on classification with Matthieu Beyou during computer science class tutorials.

For the data, we use those of Omer (former work colleague) available on his site https://git.arkalo.ovh (via the api).

The goal is not to produce the best machine learning model, but to understand how the form of orchestration influences the complexity, readability, and scalability of the system.

The problem: transforming repositories into usable knowledge

The dataset consists of a list of Git repositories defined in a repos.json file, the data of the directories listed on https://git.arkalo.ovh/explore/repos.

For your information, you can extract the information using the Composio connector for https://composio.dev/toolkits/gitea. Refer to my previous article for the implementation: https://blog.darkwood.com/fr/article/relacher-les-connecteurs-des-outils-au-langage.

Each deposit becomes a canonical document constructed from:

  • repository name
  • description
  • README
  • metadata (owner, topics…)

This document is then transformed via a classic text mining pipeline:

  1. Pretreatment (cleaning, tokenization)
  2. Feature Extraction
  3. TF-IDF Weighting
  4. Similarity between documents
  5. Classification / clustering

This pipeline is directly inspired by historical approaches:

  • TF-IDF: weight = tf * log(N / df)
  • Cosine similarity between documents
  • Supervised Naive Bayes Classification
  • Unsupervised k-means clustering

What interests us here is not the algorithm, but the way to orchestrate it.

Note that if you are fond of documentation, you can refer to the Resources section at the bottom of the article which lists a number of topics concerning data mining applied in computer science.

Business pipeline (independent of orchestration)

First and foremost, the core business needs to be isolated.

Repository → Document → Tokens → Features → TF-IDF → Similarity → Results
Enter fullscreen mode Exit fullscreen mode

This pipeline represents a data transformation.

Each step:

  • takes a piece of data
  • produces new data
  • without strong dependence on an external context

This is precisely where the two approaches diverge.

Approach 1 Message-Oriented - Orchestration via Symfony Messenger

In the Message-Oriented implementation, the pipeline is not expressed as a continuous data transformation.

It is encapsulated in a message, then executed via the Symfony bus.

Execution Model

Command → Message Bus → Handler → PipelineService → Stages
Enter fullscreen mode Exit fullscreen mode

In concrete terms:

  • a CLI command triggers the execution
  • A message is sent
  • a handler takes care of the execution
  • the core business remains centralized in a shared service
RunMessengerPipelineMessage
→ RunMessengerPipelineHandler
→ PipelineService
Enter fullscreen mode Exit fullscreen mode

Separation of responsibilities

This implementation adheres to a key project constraint:

The core business is strictly shared between the two approaches

So :

  • Messenger contains no business logic
  • he only orchestrates the execution

Actual Pipeline Executed

The handler triggers a deterministic pipeline:

1. ingest
2. preprocess
3. feature build
4. classification
5. clustering
Enter fullscreen mode Exit fullscreen mode

Each step is executed in a common application service (PipelineService).

Concepts introduced by Messenger

The orchestration explicitly introduces:

  • a message class
  • a dedicated handler
  • a dependence on the bus
  • a dispatch layer
Command → Message → Handler → Service
Enter fullscreen mode Exit fullscreen mode

These elements are specific to Messenger and do not exist in the data-oriented model

Observability and debugging

Messenger offers a natural debugging model:

  • message inspection
  • middleware
  • bus logging
  • Extensibility towards async / queue
Debug = niveau message + middleware
Enter fullscreen mode Exit fullscreen mode

Nature of the overhead

In this MVP, the overhead is conceptually measurable:

  • introduction of an artificial message
  • Indirection via handler
  • the need to structure the execution around the bus

But this overhead is located in the orchestration adapter, not in the hardware.

Summary

This approach transforms the pipeline into:

a distributed work unit

She favors:

  • Symfony standardization
  • extensibility towards async
  • integration with the ecosystem

At the cost of an additional layer of indirection.

Conceptual Example

final class ComputeTfIdfMessage
{
    public function __construct(public DocumentId $id) {}
}
Enter fullscreen mode Exit fullscreen mode
final class ComputeTfIdfHandler
{
    public function __invoke(ComputeTfIdfMessage $message)
    {
        $document = $this->repository->get($message->id);
        $vector = $this->tfidf->compute($document);

        $this->bus->dispatch(new ComputeSimilarityMessage($vector));
    }
}
Enter fullscreen mode Exit fullscreen mode

Benefits

  • strong decoupling
  • resilience (retry, queue)
  • native parallelization
  • Symfony standard

Structural Limitations

The problem quickly becomes apparent:

➡️ the message becomes an artificial envelope

We manipulate:

  • IDs
  • persistent states
  • indirect transitions

The problem is simply:

data → transformation → data
Enter fullscreen mode Exit fullscreen mode

This introduces:

  • the boilerplate
  • implicit dependencies
  • a loss of overall readability

Approach 2 Data-Oriented - Orchestration via Navi (synchronous) and Flow (asynchronous)

In the Data-Oriented implementation, the pipeline is expressed as an ordered sequence of actions applied to a context.

There is no message.

There is no dispatch.

There is only:

  • a piece of data
  • a context
  • a sequential transformation

Execution Model

Command → WorkflowRunner → Actions → PipelineService → Data
Enter fullscreen mode Exit fullscreen mode

In concrete terms:

  • a command triggers a workflow
  • The WorkflowRunner executes a list of actions
  • each action transforms a Context
  • The business services are identical to Messenger
WorkflowRunner
→ PipelineStageAction[]
→ Context
→ PipelineService
Enter fullscreen mode Exit fullscreen mode

Pipeline Structure

The pipeline is explicitly defined as a sequence:

[IngestAction,
 PreprocessAction,
 FeatureBuildAction,
 ClassificationAction,
 ClusteringAction]
Enter fullscreen mode Exit fullscreen mode

Each action:

  • takes a Context
  • applies a transformation
  • returns a new Context

Nature of the Context

The Context becomes the central object:

  • it contains the pipeline status
  • it evolves at each stage
  • it is inspectable
Context₀ → Context₁ → Context₂ → ... → Contextₙ
Enter fullscreen mode Exit fullscreen mode

Concepts introduced by Flow

This approach introduces:

  • explicit actions
  • a runner
  • an evolving context
Data → Action → Data
Enter fullscreen mode Exit fullscreen mode

Unlike Messenger:

  • no message
  • no handler
  • no bus

Observability and debugging

The debugging process changes completely in nature:

Debug = suite d’actions + snapshots de contexte
Enter fullscreen mode Exit fullscreen mode

Benefits :

  • visible execution order
  • inspectable intermediate state
  • deterministic pipeline

Nature of readability

The pipeline can be directly read as a stream:

ingest → preprocess → features → classification → clustering
Enter fullscreen mode Exit fullscreen mode

Without structural transformation.

Structural Overhead

The cost introduced is different:

  • need for a Context
  • abstraction via actions

But :

  • no envelope
  • no bus detours
  • no break in the data flow

Summary

This approach transforms the pipeline into:

a series of data transformations

She favors:

  • immediate readability
  • direct transformation of data
  • absence of envelope
  • deterministic pipeline
  • ease of testing

Boundaries

  • less suitable for complex distributed systems
  • requires strict discipline regarding the purity of the transformations
  • Tooling less standard than Messenger

Direct Comparison

Criteria Message-Oriented Data-Oriented
Mental model Events / Messages Data streams
Readability fragmented linear
Overhead high (messages, handlers) low
Scalability excellent depends on the design
Debug indirect direct
Business coupling weak but diffuse strong but explicit
Appearance Messenger NaviFlow
Central unit Message Context
Orchestration Bus + Handler Runner + Actions
Flow indirect direct
Debug message-centric data-centric
Overhead message + handler action + context
Pipeline encapsulated explicit

Key point: the illusion of complexity

In the case of text mining, each step is:

  • pure
  • determinist
  • functional

Examples:

  • TF-IDF → simple mathematical formula
  • Similarity cosine → normalized dot product

There is no natural need for messages.

The introduction of Messenger is therefore an architectural decision, not a business necessity.

Main Insight

Message-oriented transforms data into events.

Data-oriented technology transforms data into data.**

In a system like this:

  • Message-Oriented adds a layer
  • Data-Oriented reveals the model

Implications for Symfony

Symfony is evolving towards:

  • async
  • workers
  • sidekicks (FrankenPHP)
  • distributed orchestration

But this raises a fundamental question:

👉 Does everything have to be orchestrated via messages?

The answer depends on the problem.

When to use each approach

Message-Oriented

  • distributed workflows
  • long tasks
  • resilient systems
  • industry events

Data-Oriented

  • analytical pipelines
  • data transformations
  • deterministic systems
  • intensive calculations

Source code

The project's source code is free and can be viewed here: https://github.com/matyo91/omer-quotes

Conclusion

This project demonstrates one simple thing:

👉 Orchestration is not neutral

Two functionally identical implementations can produce:

  • radically different systems
  • opposing cognitive costs
  • divergent evolutionary capacities

In the case of text mining:

  • Message-Oriented makes things more complex
  • Data-Oriented clarifie

Symfony Messenger orchestrates a pipeline as a unit of work.

Darkwood Flow orchestrates a pipeline as a data transformation.

Next

The next step in the project is to:

  • extend the pipeline (clustering, classification)
  • to integrate more advanced models
  • expose an API
  • compare actual performance

But most importantly:

👉 continue to question the form of the orchestration.

Resources

Thank you for writing the article

Examples from friends on EIT topics and mathematical models applied to AI

The Future of AI as Seen by Yann LeCun

Top comments (0)