DSPy-ReAct-Machina is an alternative ReAct implementation for DSPy that maintains full conversation history in a unified context. While DSPy already provides dspy.ReAct
for building reasoning-and-acting agents, I found some challenges when working with it in multi-turn conversations and when trying to inspect the agent's decision-making process.
Important: This implementation doesn't add new functionality that dspy.ReAct
doesn't already support. Both implementations can solve the same problems. Instead, DSPy-ReAct-Machina focuses on addressing specific issues I encountered with conversation history representation, trajectory transparency, and caching efficiency.
Why DSPy-ReAct-Machina Was Created
While using dspy.ReAct
, I identified three main issues that motivated me to create an alternative implementation:
Issue 1: History Representation is Hard to Inspect
dspy.ReAct
embeds conversation history as a JSON object inside the user message. This works functionally, but makes it difficult to inspect and reason about what's happening in multi-turn conversations.
Here's an example from a second turn in a conversation using dspy.ReAct
:
[[ ## history ## ]]
{"messages": [{"question": "Which cities can I check the weather for?", "trajectory": {"thought_0": "I need to find out which cities are available...", "tool_name_0": "list_weather_cities", "tool_args_0": {}, "observation_0": "Available cities for weather: Paris, London, Tokyo, New York, current location", ...}, "reasoning": "...", "answer": "..."}]}
This JSON blob contains the entire previous interaction, making it hard to:
- Understand the conversation flow at a glance
- Debug issues when they occur
- Leverage chat-based interfaces that work with message lists
With ReActMachina, the same history is maintained as proper conversation messages:
User message:
[[ ## machine_state ## ]]
user_query
[[ ## question ## ]]
Which cities can I check the weather for?
Assistant message:
[[ ## reasoning ## ]]
To answer the user's question about which cities they can check the weather for, I need to list all available cities...
[[ ## tool_name ## ]]
list_weather_cities
...
Each interaction becomes a clear user/assistant message pair in the conversation history.
Issue 2: Trajectory Mutates the User Message
In multi-turn conversations, the trajectory keeps mutating the user message with each tool call. The user message grows to include:
- The original question
- The JSON-serialized history
- The entire current trajectory (all thoughts, tool calls, and observations)
This mutation happens within a single predictor call, making it harder to track exactly what the agent is seeing at each step.
ReActMachina addresses this by:
- Breaking each step into separate messages
- Keeping the conversation history immutable: new interactions append new messages rather than modifying existing ones
- Maintaining consistent field structure across all turns
Issue 3: Caching Inefficiency
dspy.ReAct
operates with 3 messages:
- System prompt
- User message (containing question + history + trajectory)
- Assistant response
The user message constantly mutates with each tool call and each conversation turn. This breaks caching mechanisms that many LLM providers offer, as they cache based on prefix matching of the conversation history.
ReActMachina keeps the history immutable. When a new query comes in or a tool returns a result:
- A new message is created
- The system prompt stays constant
- Previous messages never change
- Only new messages are appended
This makes each interaction more cache-friendly, potentially reducing costs and latency.
Visual Comparison: Same Prompts, Different Approaches
Let's look at how both implementations handle the same conversation. I ran two prompts against both modules:
Prompt 1: "Which cities can I check the weather for?"
Prompt 2: "Check London and Tokyo"
dspy.ReAct Structure
First turn: The user message contains:
-
question
: "Which cities can I check the weather for?" -
history
:{"messages": []}
-
trajectory
: Inline field with all tool calls (thought_0, tool_name_0, tool_args_0, observation_0, thought_1, tool_name_1, ...)
Second turn: The user message contains:
-
question
: "Check London and Tokyo" -
history
:{"messages": [<entire first interaction serialized as JSON>]}
-
trajectory
: New inline field with all tool calls for this turn
The user message keeps growing, embedding more and more context.
ReActMachina Structure
First turn: Multiple messages in the conversation:
- User message:
machine_state=user_query
,question=...
- Assistant message:
reasoning=...
,tool_name=list_weather_cities
,tool_args={}
,response=...
- User message:
machine_state=tool_result
,tool_result=...
- Assistant message:
reasoning=...
,tool_name=finish
, ... - User message:
machine_state=finish
,tool_result=...
- Assistant message:
reasoning=...
,answer=...
Second turn: The conversation continues naturally:
- User message:
machine_state=user_query
,question="Check London and Tokyo"
- Assistant message:
reasoning=...
,tool_name=get_weather
,tool_args={"city": "London"}
, ... - User message:
machine_state=tool_result
,tool_result=...
- ... (continues with more state transitions)
The full conversation history is visible as a sequence of messages, with no JSON serialization and no mutation.
How ReActMachina Works Internally
ReActMachina uses DSPy's standard dspy.History
object to manage the conversation history. Each interaction - whether a user query, tool call, or response - is appended as a new message to this history object, which is passed between turns to maintain context.
The State Machine
ReActMachina uses a simple finite state machine with 4 states:
-
USER_QUERY: Initial state when processing user input
-
Inputs:
machine_state
and original signature inputs -
Outputs:
reasoning
(optional),tool_name
,tool_args
,response
- Transitions to: TOOL_RESULT
-
Inputs:
-
TOOL_RESULT: State after a tool has been executed
-
Inputs:
machine_state
,tool_result
-
Outputs:
reasoning
(optional),tool_name
,tool_args
,response
- Transitions to: TOOL_RESULT (chain tools), INTERRUPTED, FINISH
-
Inputs:
-
INTERRUPTED: State for forced completion when max_steps is reached
-
Inputs:
machine_state
,tool_result
,interruption_instructions
-
Outputs:
reasoning
(optional) and original signature outputs - Transitions to: Terminal state (no further transitions)
-
Inputs:
-
FINISH: Terminal state for normal completion when agent calls finish tool
-
Inputs:
machine_state
,tool_result
-
Outputs:
reasoning
(optional) and original signature outputs - Transitions to: Terminal state (no further transitions)
-
Inputs:
The state machine enforces valid transitions. For example, you can't jump from USER_QUERY directly to INTERRUPTED or FINISH - you must go through TOOL_RESULT first.
Dynamic Field Masking with ReActMachinaAdapter
The magic happens in the custom adapter. Instead of creating different system prompts for each state, the adapter:
- Maintains a unified system prompt that documents all possible states and their structures
- Dynamically masks fields in user messages based on the current state
- Prevents the LLM from generating irrelevant fields by explicitly telling it which fields to produce
Here's how it works:
System Prompt (stays constant across all calls):
This agent operates as a state machine. The `machine_state` field determines
which function (inputs → outputs) is active.
These are possible input fields:
1. `machine_state` (Literal['user_query', 'tool_result', 'interrupted', 'finish'])
2. `question` (str)
3. `tool_result` (str)
4. `interruption_instructions` (str)
These are possible output fields:
1. `reasoning` (str)
2. `tool_name` (str)
3. `tool_args` (dict)
4. `response` (str)
5. `answer` (str)
---
For the `user_query` state, messages are structured as:
[Shows only: machine_state, question → reasoning, tool_name, tool_args, response]
For the `tool_result` state, messages are structured as:
[Shows only: machine_state, tool_result → reasoning, tool_name, tool_args, response]
For the `finish` state, messages are structured as:
[Shows only: machine_state, tool_result → reasoning, answer]
...
User Message (changes based on state):
[[ ## machine_state ## ]]
user_query
[[ ## question ## ]]
Which cities can I check the weather for?
Respond using the exact field format `[[ ## field_name ## ]]`.
Required fields in order: `[[ ## reasoning ## ]]`, then `[[ ## tool_name ## ]]`,
then `[[ ## tool_args ## ]]`, then `[[ ## response ## ]]`, ending with
`[[ ## completed ## ]]`. Do NOT generate the following fields for this state:
`[[ ## answer ## ]]`.
The adapter masks out answer
for the USER_QUERY state because it's not relevant yet. This prevents the LLM from trying to produce a final answer when it should be deciding which tool to use.
Single Predictor, Multiple Signatures
ReActMachina uses:
-
One predictor type: Either
dspy.Predict
ordspy.ChainOfThought
- Multiple signatures: One for each state
-
State-driven execution: The
machine_state
field determines which signature is active
This design keeps the implementation simple while providing flexibility. The same predictor operates on different signatures based on the current state, and the adapter handles the rest.
Examples
The repository includes a weather agent example that demonstrates ReActMachina in action. Here's how to run it:
Interactive Mode (Default)
uv run examples/react_machina_chat_example.py
This starts an interactive chat session where you can:
- Ask multiple questions
- Use
/tools
to list available tools - Use
/inspect_history
to see the full conversation history - Use
/trajectory
to see the detailed trajectory of the last response
One-Turn Mode
uv run examples/react_machina_chat_example.py --query "What's the weather in Paris?"
Using Different Predictors
## Use dspy.Predict instead of dspy.ChainOfThought (default)
uv run examples/react_machina_chat_example.py --predictor predict
## Use dspy.ChainOfThought (explicit)
uv run examples/react_machina_chat_example.py --predictor cot
Async Mode
uv run examples/react_machina_chat_example.py --async
Limiting Agent Steps
## Limit to 5 steps before forced completion
uv run examples/react_machina_chat_example.py --max-steps 5
When the max steps are reached, the agent enters the INTERRUPTED state and synthesizes a final answer with the information gathered so far, acknowledging the interruption.
Debug Mode (Demonstrating Error Handling)
## Force LLM to produce malformed output at step 2 to see how the agent recovers
uv run examples/react_machina_chat_example.py \
--debug-fail-mode malformed-markers \
--debug-fail-step 2
Note: This debug mode artificially injects formatting errors to demonstrate how the agent handles and recovers from LLM output issues.
Important Considerations
Optimization and Prompt Engineering
I haven't yet tested DSPy's optimization modules with ReActMachina. DSPy provides powerful optimizers like MIPROv2 and GEPA that can improve agent performance through prompt engineering and few-shot example selection. I plan to explore how these optimizers work with ReActMachina next, as they could potentially enhance the agent's reasoning and tool-calling capabilities.
Custom Adapter Requirement
ReActMachina depends on a custom adapter (ReActMachinaAdapter
) to function properly. This adapter is essential for maintaining a single unified system prompt while working with multiple state-specific signatures. Without it, each state would generate different system prompts, breaking the consistency that makes the conversation history work smoothly.
That said, the module is designed to be generic and signature-agnostic, just like dspy.ReAct
. You can use it with any signature you define - the adapter handles the complexity of mapping your signature's inputs and outputs to the appropriate states automatically.
Conclusion
DSPy-ReAct-Machina doesn't achieve anything new that dspy.ReAct
can't already do. Both implementations can build reasoning-and-acting agents that solve tasks by using tools iteratively. However, I created this alternative implementation to address specific challenges I encountered:
- History representation: I found it hard to reason about JSON-serialized history. I wanted proper conversation messages that align with how chat interfaces work.
- Trajectory transparency: Inline trajectory fields that keep mutating made debugging difficult. I wanted each step to be its own message.
- Caching efficiency: Constantly mutating user messages break LLM caching. I wanted immutable history that only grows, never changes.
I love how DSPy standardizes prompts and makes them programmable. The library's approach to signatures and modules is powerful. However, I believe that throwing a history object inside the context window - even though it works - isn't ideal. We should benefit from the chat interfaces that allow us to provide history in a more structured way.
The history is also the trajectory the agent took to reach the final answer. It's important to keep it clear and easy to reason about, both for debugging and for understanding how the agent makes decisions.
If you're interested in trying ReActMachina or contributing to it, check out the repository and give it a try. I'd love to hear your feedback and learn about your experience using it.
Note on naming: The "Machina" name comes from the fact that this implementation uses a state machine internally. It's a bit of wordplay - ReAct-Machina: a machine for ReAct, built on a state machine.
Repository: github.com/armoucar/dspy-react-machina
PyPI: github.com/armoucar/dspy-react-machina
Top comments (0)