🚀 SoloEngine v0.3.0 Release — Checkpoint Mechanism & Message Queue

#ai #webdev #programming #productivity

[v0.3.0] - 2026-06-29

🚀 Added

Checkpoint Mechanism — ReActCore introduces three checkpoints during streaming: content_ended (after text content), before_tool_calls (before tool calls), and after_tool_calls (after tool calls), enabling precise interception and state synchronization of the execution flow.
Message Queue System — Added a new MessageQueue class in run.py, supporting async enqueue, drain, and remove operations. Users can now queue messages while the LLM is running; queued messages are sent automatically after the current task completes. The frontend introduces a QueueBar component to display queued messages, with CSS spinning animation, single-line ellipsis, and hover-to-delete functionality.
Queue Message Merging — MessageQueue.drain_all() now merges consecutive messages with the same name into a single message, preventing fragmented user input when multiple queue entries share the same sender.
Queue WebSocket Events — The execution event protocol introduces three new event types: message_queued, queue_drained, and queue_returned (useRunWebSocket.ts). The frontend processes queue state updates in real time.
Stop & Queue Integration — When the user clicks Stop, pending queued messages are returned to the input box via queue_returned. Checkpoint stops cleanly clear the queue and automatically start the next message.
System Notification Messages — Introduced the SystemMessage type (with notification role) to separate error messages from assistant content. Errors are now rendered as independent notification bubbles, no longer embedded within assistant message cards.
tiktoken Real-Time Token Estimation — ReActCore initializes a tiktoken encoder on startup for real-time token counting during streaming. Unknown models fall back to o200k_base.

🔧 Improved

Custom Model Name Auto-Complete — The model name field in ModelManager has been upgraded from Select to AutoComplete, allowing users to type custom model names not in the predefined list.
Message Block Type Unification — Added a new ToolCallsBlock type (message_block.py), unifying the internal tool-call format to an OpenAI-style tool_calls list, replacing the legacy ToolUseBlock.
ReActCore Token Accumulation — Token usage is now estimated chunk-by-chunk via tiktoken and overwritten with the API's precise usage when received. Introduced _get_valid_token_value to uniformly handle None/0/empty values from the API, preventing placeholder values from overwriting accumulated values.
ReActCore Streaming Architecture Refactor — Streaming output structure in react_core.py refactored: each ChatResponse now contains only a single block type (SoloTextBlock or SoloThinkingBlock), replacing the previous pattern of accumulating multiple blocks before yielding. This enables the frontend to precisely handle each type of incremental content.
Message Type Separation — The frontend message type has been unified from LLMMessage[] to Message[] (LLMMessage | SystemMessage). convertToLLMMessages now splits system messages such as error into independent SystemMessage entries rather than reusing LLMMessage. Updated MessageList and RunPanel to render SystemMessage entries.
Unified Error Save Path — Normal and error paths in save_assistant_message now follow the same code path. error is saved as an independent field and no longer written into the data block.
Execution Completion Reliability — on_execution_done now detects empty-collector scenarios (where stream_callback was never triggered) and automatically sets the status to error with detailed LLM failure output. This prevents false "completed" status during silent failures.
Session Creation Flow — createNewSession in runPanelStore now immediately inserts the new session into the sessions array, ensuring the UI reflects the new session without requiring a refresh.
Message Input UX — During LLM execution, the send button stays active to queue new messages instead of showing a blocking warning. The input area supports queuing via WebSocket when isRunning or isWaitingReply is true. The ENTER key now triggers send (queue) as long as the input has content, no longer blocked by isRunning state.
Assistant Message ID Uniqueness — Each execution_start now generates a new msg_asst_${Date.now()} ID, preventing React key conflicts when the queue drain triggers a new assistant message that would otherwise duplicate the previous message ID.
WebSocket Architecture Refactor — websocket_handler.py has been significantly simplified, removing complex grace period and takeover logic. RunContext now owns its own event loop, and the WebSocket only acts as a transport layer with injected callbacks.
Streaming Cancellation Handling — When the user clicks the stop button, aclose() closing the stream in ReActCore no longer raises CancelledError; it is treated as a normal end (breaking out of the loop), avoiding misinterpreting expected stream closure as a cancellation error.
Tool Call Event Management — ToolCallEventManager now uniformly sends tool-call events to the frontend through tool_calls blocks, removing the legacy tool_use block handling logic.
Formatter Simplification — TruncatedFormatterBase removed token_counter and max_tokens parameters along with dead code such as _truncate and _count; the constructor is simplified to a no-arg empty implementation. OpenAIChatFormatter removed the OpenAIMultiAgentFormatter subclass.
Anthropic message_start/message_delta Usage Capture — anthropic_model.py captures input_tokens in the message_start event and accumulated output_tokens in the message_delta event, resolving inaccurate token counts in Anthropic streaming responses.
Anthropic OpenAI Format Conversion — Added a new _convert_openai_to_anthropic_messages static method that converts OpenAI-format messages (tool_calls, reasoning_content) to Anthropic format (tool_use, thinking).
Streaming Tool Call Split — anthropic_model.py and qwen_model.py now yield a separate ChatResponse for each tool_call, enabling precise handling on the frontend.

🗑️ Removed

Removed the backend/SoloAgent/token_counter/ directory, including __init__.py, openai_token_counter.py, and token_base.py. The original functionality has been merged into ReActCore inline accumulation.
backend/SoloAgent/formatter/truncated_formatter_base.py — removed from the exports in __init__.py; the file is preserved but is no longer public API.
Removed dead code OpenAIMultiAgentFormatter from openai_formatter.py.
ToolUseBlock — the message block type was unified from tool_use to tool_calls; the legacy tool_use type is no longer used.
_convert_anthropic_message_to_solo_format function — removed a 99-line Anthropic→Solo format conversion function in anthropic_model.py, replaced by the new OpenAI→Anthropic conversion method.
Reasoning_content injection code removed — Removed the logic that extracts reasoning_content from content in OpenAIChatFormatter, and removed the compatibility code (with DEBUG logs) that forcibly added an empty reasoning_content to messages carrying tool_calls.
Message grouping removed tool_use support — TruncatedFormatterBase._group_messages now only checks tool_calls and tool_result types, removing tool_use compatibility code.

🐛 Fixed

Empty Collector False Report — Fixed a bug where agent.reply caught an exception and returned an error string, but _execute_agent treated the run as completed because the collector was empty. on_execution_done now checks collector.get_chunk_count() > 0 and forces status="error" with the actual LLM error output.
LLM Tokens Missing on Manual Pause — Resolved via tiktoken real-time estimation, which serves as a fallback when API usage is unreachable.
Message Error Display — Error messages are no longer hidden inside assistant content blocks; they are rendered as independent notification-role messages with proper styling and status indicators.
Streaming tool_use Handling — Fixed the parsing and event-sending logic of legacy tool_use blocks in streaming responses, unifying to the tool_calls format.
Filter Thinking-Only Messages in Cache — OpenAIChatFormatter adds filtering logic: when an assistant message has empty content and no tool_calls, it is skipped. This fixes the LLM API error "Input is a zero-length, empty document".
ChunkCollector Type Mapping Correction — ChunkCollector._extract_raw_type in run.py removed recognition of the 'tool_use' type string, unifying the mapping to 'tool_calls'.
aclose No Longer Raises CancelledError — aclose() closing the stream in react_core.py no longer raises CancelledError; it breaks out of the loop and is treated as a normal end, avoiding misinterpreting expected stream closure as a cancellation error.
model._was_cancelled Exception Conversion — When the model closes the stream due to aclose in react_core.py, it is no longer converted to asyncio.CancelledError; it breaks out of the loop and is treated as a normal end, avoiding misinterpreting expected stream closure as a cancellation error.
flow_compiler Early Cancellation Detection — flow_compiler.py adds new logic: immediately after creating the flow, check whether cancel_event is already set, and if so, cancel the flow right away, avoiding wasted HTTP connections.
CompiledFlow Interruption Detection — flow_compiler.py fixed a bug where an interrupted Agent incorrectly returned "completed"; it now returns "stop" status. The output field is no longer filled with error, preventing error pollution of the output.
Windows Clipboard Compatibility — MessageList.tsx unifies on the W3C Clipboard API, ensuring that Windows clipboard history (Win+V) is correctly captured; fixed the issue where clipboard history could not be triggered on Windows.

🤝 Join Us

We're looking for like-minded contributors who share our passion for SoloEngine and Agentic AI. Every contribution — from a typo fix to a full feature — makes SoloEngine better.

📝 Contributing Guide · 💬 Discussions · 📧 Contact Us