Arthur Moura Carvalho

Posted on Oct 6

DSPy-ReAct-Machina: An Alternative Multi-Turn ReAct Module for DSPy

#llm #tooling #ai #python

DSPy-ReAct-Machina is an alternative ReAct implementation for DSPy that maintains full conversation history in a unified context. While DSPy already provides dspy.ReAct for building reasoning-and-acting agents, I found some challenges when working with it in multi-turn conversations and when trying to inspect the agent's decision-making process.

Important: This implementation doesn't add new functionality that dspy.ReAct doesn't already support. Both implementations can solve the same problems. Instead, DSPy-ReAct-Machina focuses on addressing specific issues I encountered with conversation history representation, trajectory transparency, and caching efficiency.

Why DSPy-ReAct-Machina Was Created

While using dspy.ReAct, I identified three main issues that motivated me to create an alternative implementation:

Issue 1: History Representation is Hard to Inspect

dspy.ReAct embeds conversation history as a JSON object inside the user message. This works functionally, but makes it difficult to inspect and reason about what's happening in multi-turn conversations.

Here's an example from a second turn in a conversation using dspy.ReAct:

[[ ## history ## ]]
{"messages": [{"question": "Which cities can I check the weather for?", "trajectory": {"thought_0": "I need to find out which cities are available...", "tool_name_0": "list_weather_cities", "tool_args_0": {}, "observation_0": "Available cities for weather: Paris, London, Tokyo, New York, current location", ...}, "reasoning": "...", "answer": "..."}]}

This JSON blob contains the entire previous interaction, making it hard to:

Understand the conversation flow at a glance
Debug issues when they occur
Leverage chat-based interfaces that work with message lists

With ReActMachina, the same history is maintained as proper conversation messages:

User message:
[[ ## machine_state ## ]]
user_query

[[ ## question ## ]]
Which cities can I check the weather for?

Assistant message:
[[ ## reasoning ## ]]
To answer the user's question about which cities they can check the weather for, I need to list all available cities...

[[ ## tool_name ## ]]
list_weather_cities
...

Each interaction becomes a clear user/assistant message pair in the conversation history.

Issue 2: Trajectory Mutates the User Message

In multi-turn conversations, the trajectory keeps mutating the user message with each tool call. The user message grows to include:

The original question
The JSON-serialized history
The entire current trajectory (all thoughts, tool calls, and observations)

This mutation happens within a single predictor call, making it harder to track exactly what the agent is seeing at each step.

ReActMachina addresses this by:

Breaking each step into separate messages
Keeping the conversation history immutable: new interactions append new messages rather than modifying existing ones
Maintaining consistent field structure across all turns

Issue 3: Caching Inefficiency

dspy.ReAct operates with 3 messages:

System prompt
User message (containing question + history + trajectory)
Assistant response

The user message constantly mutates with each tool call and each conversation turn. This breaks caching mechanisms that many LLM providers offer, as they cache based on prefix matching of the conversation history.

ReActMachina keeps the history immutable. When a new query comes in or a tool returns a result:

A new message is created
The system prompt stays constant
Previous messages never change
Only new messages are appended

This makes each interaction more cache-friendly, potentially reducing costs and latency.

Visual Comparison: Same Prompts, Different Approaches

Let's look at how both implementations handle the same conversation. I ran two prompts against both modules:

Prompt 1: "Which cities can I check the weather for?"
Prompt 2: "Check London and Tokyo"

dspy.ReAct Structure

First turn: The user message contains:

question: "Which cities can I check the weather for?"
history: {"messages": []}
trajectory: Inline field with all tool calls (thought_0, tool_name_0, tool_args_0, observation_0, thought_1, tool_name_1, ...)

Second turn: The user message contains:

question: "Check London and Tokyo"
history: {"messages": [<entire first interaction serialized as JSON>]}
trajectory: New inline field with all tool calls for this turn

The user message keeps growing, embedding more and more context.

ReActMachina Structure

First turn: Multiple messages in the conversation:

User message: machine_state=user_query, question=...
Assistant message: reasoning=..., tool_name=list_weather_cities, tool_args={}, response=...
User message: machine_state=tool_result, tool_result=...
Assistant message: reasoning=..., tool_name=finish, ...
User message: machine_state=finish, tool_result=...
Assistant message: reasoning=..., answer=...

Second turn: The conversation continues naturally:

User message: machine_state=user_query, question="Check London and Tokyo"
Assistant message: reasoning=..., tool_name=get_weather, tool_args={"city": "London"}, ...
User message: machine_state=tool_result, tool_result=...
... (continues with more state transitions)

The full conversation history is visible as a sequence of messages, with no JSON serialization and no mutation.

How ReActMachina Works Internally

ReActMachina uses DSPy's standard dspy.History object to manage the conversation history. Each interaction - whether a user query, tool call, or response - is appended as a new message to this history object, which is passed between turns to maintain context.

The State Machine

ReActMachina uses a simple finite state machine with 4 states:

USER_QUERY: Initial state when processing user input
- Inputs: machine_state and original signature inputs
- Outputs: reasoning (optional), tool_name, tool_args, response
- Transitions to: TOOL_RESULT, FINISH
TOOL_RESULT: State after a tool has been executed
- Inputs: machine_state, tool_result
- Outputs: reasoning (optional), tool_name, tool_args, response
- Transitions to: TOOL_RESULT (chain tools), INTERRUPTED, FINISH
INTERRUPTED: State for forced completion when max_steps is reached
- Inputs: machine_state, tool_result, interruption_instructions
- Outputs: reasoning (optional) and original signature outputs
- Transitions to: Terminal state (no further transitions)
FINISH: Terminal state for normal completion when agent calls finish tool
- Inputs: machine_state, tool_result
- Outputs: reasoning (optional) and original signature outputs
- Transitions to: Terminal state (no further transitions)

The state machine enforces valid transitions. For example, you can't jump from USER_QUERY directly to INTERRUPTED or FINISH - you must go through TOOL_RESULT first.

Dynamic Field Masking with ReActMachinaAdapter

The magic happens in the custom adapter. Instead of creating different system prompts for each state, the adapter:

Maintains a unified system prompt that documents all possible states and their structures
Dynamically masks fields in user messages based on the current state
Prevents the LLM from generating irrelevant fields by explicitly telling it which fields to produce

Here's how it works:

System Prompt (stays constant across all calls):

This agent operates as a state machine. The `machine_state` field determines
which function (inputs → outputs) is active.

These are possible input fields:
1. `machine_state` (Literal['user_query', 'tool_result', 'interrupted', 'finish'])
2. `question` (str)
3. `tool_result` (str)
4. `interruption_instructions` (str)

These are possible output fields:
1. `reasoning` (str)
2. `tool_name` (str)
3. `tool_args` (dict)
4. `response` (str)
5. `answer` (str)

---

For the `user_query` state, messages are structured as:
[Shows only: machine_state, question → reasoning, tool_name, tool_args, response]

For the `tool_result` state, messages are structured as:
[Shows only: machine_state, tool_result → reasoning, tool_name, tool_args, response]

For the `finish` state, messages are structured as:
[Shows only: machine_state, tool_result → reasoning, answer]

...

User Message (changes based on state):

[[ ## machine_state ## ]]
user_query

[[ ## question ## ]]
Which cities can I check the weather for?

Respond using the exact field format `[[ ## field_name ## ]]`.
Required fields in order: `[[ ## reasoning ## ]]`, then `[[ ## tool_name ## ]]`,
then `[[ ## tool_args ## ]]`, then `[[ ## response ## ]]`, ending with
`[[ ## completed ## ]]`. Do NOT generate the following fields for this state:
`[[ ## answer ## ]]`.

The adapter masks out answer for the USER_QUERY state because it's not relevant yet. This prevents the LLM from trying to produce a final answer when it should be deciding which tool to use.

Single Predictor, Multiple Signatures

ReActMachina uses:

One predictor type: Either dspy.Predict or dspy.ChainOfThought
Multiple signatures: One for each state
State-driven execution: The machine_state field determines which signature is active

This design keeps the implementation simple while providing flexibility. The same predictor operates on different signatures based on the current state, and the adapter handles the rest.

Examples

The repository includes a weather agent example that demonstrates ReActMachina in action. Here's how to run it:

Interactive Mode (Default)

uv run examples/react_machina_chat_example.py

This starts an interactive chat session where you can:

Ask multiple questions
Use /tools to list available tools
Use /inspect_history to see the full conversation history
Use /trajectory to see the detailed trajectory of the last response

One-Turn Mode

uv run examples/react_machina_chat_example.py --query "What's the weather in Paris?"

Using Different Predictors

## Use dspy.Predict instead of dspy.ChainOfThought (default)
uv run examples/react_machina_chat_example.py --predictor predict

## Use dspy.ChainOfThought (explicit)
uv run examples/react_machina_chat_example.py --predictor cot

Async Mode

uv run examples/react_machina_chat_example.py --async

Limiting Agent Steps

## Limit to 5 steps before forced completion
uv run examples/react_machina_chat_example.py --max-steps 5

When the max steps are reached, the agent enters the INTERRUPTED state and synthesizes a final answer with the information gathered so far, acknowledging the interruption.

Debug Mode (Demonstrating Error Handling)

## Force LLM to produce malformed output at step 2 to see how the agent recovers
uv run examples/react_machina_chat_example.py \
    --debug-fail-mode malformed-markers \
    --debug-fail-step 2

Note: This debug mode artificially injects formatting errors to demonstrate how the agent handles and recovers from LLM output issues.

Important Considerations

Optimization and Prompt Engineering

I haven't yet tested DSPy's optimization modules with ReActMachina. DSPy provides powerful optimizers like MIPROv2 and GEPA that can improve agent performance through prompt engineering and few-shot example selection. I plan to explore how these optimizers work with ReActMachina next, as they could potentially enhance the agent's reasoning and tool-calling capabilities.

Custom Adapter Requirement

ReActMachina depends on a custom adapter (ReActMachinaAdapter) to function properly. This adapter is essential for maintaining a single unified system prompt while working with multiple state-specific signatures. Without it, each state would generate different system prompts, breaking the consistency that makes the conversation history work smoothly.

That said, the module is designed to be generic and signature-agnostic, just like dspy.ReAct. You can use it with any signature you define - the adapter handles the complexity of mapping your signature's inputs and outputs to the appropriate states automatically.

Conclusion

DSPy-ReAct-Machina doesn't achieve anything new that dspy.ReAct can't already do. Both implementations can build reasoning-and-acting agents that solve tasks by using tools iteratively. However, I created this alternative implementation to address specific challenges I encountered:

History representation: I found it hard to reason about JSON-serialized history. I wanted proper conversation messages that align with how chat interfaces work.
Trajectory transparency: Inline trajectory fields that keep mutating made debugging difficult. I wanted each step to be its own message.
Caching efficiency: Constantly mutating user messages break LLM caching. I wanted immutable history that only grows, never changes.

I love how DSPy standardizes prompts and makes them programmable. The library's approach to signatures and modules is powerful. However, I believe that throwing a history object inside the context window - even though it works - isn't ideal. We should benefit from the chat interfaces that allow us to provide history in a more structured way.

The history is also the trajectory the agent took to reach the final answer. It's important to keep it clear and easy to reason about, both for debugging and for understanding how the agent makes decisions.

If you're interested in trying ReActMachina or contributing to it, check out the repository and give it a try. I'd love to hear your feedback and learn about your experience using it.

Note on naming: The "Machina" name comes from the fact that this implementation uses a state machine internally. It's a bit of wordplay - ReAct-Machina: a machine for ReAct, built on a state machine.

Repository: github.com/armoucar/dspy-react-machina
PyPI: https://pypi.org/project/dspy-react-machina

DEV Community