CAMEL AI

Posted on Jan 27

Eigent：Open-source Cowork Meets MiniMax M2.1

#opensource #eigent

Abstract

In real enterprise environments, many internal tools, dashboards, and legacy systems operate entirely in the browser, forming the backbone of daily business operations.To automate these complex systems, we introduce Eigent, an open-source multi-agent workforce application that runs locally and can be fully set up from source, with a strong focus on browser automation.

In this post, we’ll explore how Eigent, the opensource Cowork leverages CAMEL’s Workforce architecture and browser automation to handle complex, multi-step enterprise tasks. We’ll also take a closer look at Minimax M2.1, analyzing its performance on a real-world enterprise tasks and examining the architectural features that enable it to perform effectively in long-horizon, agentic browser automation scenarios.

Background: What is Eigent and How it supports Minimax M2.1

Eigent is an Open Source Cowork Desktop to Unlock Your Exceptional Productivity. It is built with a multi-agent workforce architecture, supported by general abilities such as browser automation, terminal automation and MCPs. This design enables agents in Eigent to perform tasks much like human workers — operating in real desktop environments, without the need for deep API integrations or constant workflow reconfiguration.

As foundation models continue to advance, integrating them with Eigent’s open-source multi-agent system serves as an open-source cowork for enterprises, enabling developers and enterprise users to apply LLM capabilities directly to real-world use cases quickly and effectively. You can navigate to the Model Settings page in Eigent, locate the OpenAI Compatible section, and input your API key and url. Once the model name is set to MiniMax-M2.1, you are ready to begin. need help? Check out our guide on [configuring your Minimax API key].

Github Repository & how to setup Eigent

GitHub Repository: https://github.com/eigent-ai/eigent

Quick Start: Setting Up the Environment

You have two ways to run Eigent: using the pre-compiled desktop app for immediate usage, or setting up the development environment to inspect the code and customize the agents.

Option A: The "Zero-Config" Desktop App

For users who want to start automating tasks immediately without touching code:

Download the client from the Official Website.
Install the .dmg (macOS) or .exe (Windows).
Launch the app—the local backend starts automatically.

Option B: Developer Setup

To access the source code and run the system locally for development, follow these steps:

1. Prerequisites Ensure you have Node.js (v18-22) and Python installed.

2. Clone and Install

# Clone the repository
git clone https://github.com/eigent-ai/eigent.git
cd eigent

# Install frontend dependencies
npm install

3. Run the Application

# Return to root and run dev mode
npm run dev

Once running, you can configure your LLM providers (Minimax M2.1, etc.) directly in the settings. For more detailed information on configuration, advanced features, and troubleshooting, please refer to our Official Documentation.

Under the Hood: Eigent full stack and CAMEL Workforce Architecture

Eigent System Overview

Eigent constitutes a local-first desktop application with multi-agent orchestration, powered by the CAMEL Workforce as its core engine. The system implements a decoupled, full-stack architecture that operates entirely on the user's local infrastructure. This design strictly ensures data sovereignty, eliminating the privacy risks associated with cloud-resident agent execution.

1. The Frontend

The user interface serves as the control plane for agent configuration and workflow monitoring. Built on React and TypeScript within an Electron framework.

Key technical components include:

State Management: Zustand is employed for handling transient application state, ensuring efficient reactivity.
Visual Orchestration: React Flow is integrated to visualize agent workspace to track real-time agent execution.
Communication: The frontend communicates with the backend via secure local HTTP requests.

2. The Backend

The core logic resides in a local Python server utilizing FastAPI and Uvicorn, which acts as the host environment for the CAMEL multi-agent framework.

Runtime Environment: The backend runs on Python 3.10+, managed by uv for high-performance dependency resolution and environment isolation.
Persistence Layer: PostgreSQL, interfaced via SQLModel/SQLAlchemy ORM, provides robust structured data storage for audit logs, workflow history, and agent states.
Multi-agentAgent Systemramework: The CAMEL framework handles agent orchestration logic (e.g., workforce), interfacing with Large Language Models (LLMs) whether remote (e.g., Minimax) or local (e.g.,via vLLM) for agent running. The CAMEL framework also offers a rich set of toolkits such as browser toolkit, terminal toolkit, document generation toolkit.

CAMEL Workforce: A Multi-Agent System Inspired by Organizational Structures

At the heart of Eigent lies CAMEL Workforce, a multi-agent system architected to resolve complex, real-world tasks through decentralized cooperation. The system utilizes a strict Producer-Consumer pattern, mediated by an asynchronous message channel to manage dependency graphs efficiently.

1. Agent Roles

Coordinator Agent: Functions as the primary dispatcher. It maintains the global state and allocates subtasks to specific workers based on availability and capability.
Task Agent: Taking responsibility for the semantic decomposition of high-level objectives into executable, atomic units.
Worker Agent: Serves as the specialized execution unit. Worker agents consume atomic subtasks and execute them using domain-specific tools.

2. Asynchronous Communication: The TaskChannel

Decoupling between the coordination layer and the execution layer is achieved via the TaskChannel. This asynchronous message queue manages task distribution without blocking the main execution thread.

Execution Flow:

Workforce initiates a task.
Worker nodes poll for assignments.
Upon completion, results are pushed back.

3. Dynamic DAG Construction

Enterprise workflows are rarely linear. CAMEL Workforce implements a dynamic Directed Acyclic Graph (DAG) construction mechanism. When a high-level prompt is received (e.g., "Create Travel Plan"), the Task Agent decomposes this objective into discrete nodes.

The system explicitly maps dependencies, allowing the scheduler to:

Execute independent nodes in parallel (e.g., Search Flight Ticket and Search Hotel run concurrently).
Block dependent nodes until their predecessors reach a DONE state.

4. Fault-tolerant Mechanism

Given the non-deterministic nature of LLMs, Eigent treats failures as expected state transitions rather than fatal exceptions. The architecture implements a robust recovery mechanism utilizing the following strategies:

RETRY: Re-executes the sub-task on the same worker to handle transient errors.
REPLAN: The Task Agent modifies the original sub-task based on the failure log before re-queueing the sub-task.
REASSIGN: The sub-task is migrated from the current worker to a different agent with a compatible skill set.
DECOMPOSE: If a task fails due to excessive complexity, it is recursively broken down into smaller subtasks.

Browser Automation Architecture in Eigent

Yet, a multi-agent workforce architecture can only unlock real enterprise automation when paired with the growing strength of general-purpose capabilities such as browser automation. This is why we emphasize building agents that can operate directly within real business environments rather than relying solely on rigid API integrations.

Eigent adopts a two-layer architecture that separates browser control from agent orchestration:

The TypeScript layer is responsible for all browser interactions. It leverages native Playwright APIs to perform DOM operations, capture structured snapshots, generate SoM screenshots, detect occlusions, and handle advanced browser logic directly within the JavaScript runtime. As Playwright is natively built in TypeScript, this layer gains access to cutting-edge features like _snapshotForAI() and ensures better performance, reliability, and developer ergonomics.
The Python layer handles AI orchestration. It manages LLM calls, agent decision-making, and task planning. This separation allows Python to focus on agent logic, where the Python ecosystem excels in AI and workflow orchestration.
The two layers communicate asynchronously via WebSocket, enabling non-blocking operations. Python sends browser operation requests, TypeScript executes them and returns results. The interaction is transparent to the end user and supports concurrent task execution.

This architecture improves performance, enhances the precision of element interactions, and enables advanced capabilities like dynamic DOM filtering, viewport-aware snapshots, and in-browser SoM rendering. It avoids the limitations of Python-only implementations, such as high latency, limited access to browser internals, and complex image processing logic. By delegating browser tasks to the native execution context, Eigent ensures a robust foundation for agent-based enterprise automation.

During multi-agent execution in enterprise automation scenarios, browser-based automation offers a natural advantage in process visibility. Every step is transparent, inspectable, and easy to debug, making it far more practical for complex and evolving workflows.

Test Minimax M2.1 in Real-World Enterprise Tasks with Eigent browser automation

We have tested Eigent with Minimax M2.1 to automate sales processes using Eigent browser automation capabilities. The tasks for agents are to automate various stages of the real-world sales cycle, including Lead Capture & Creation, Qualification & Pipeline Management, Quotation, Negotiation, Closing, and Product Management.

Across experimental runs, Minimax M2.1 consistently shows three key strengths:

Handles complex page structures well, including iframes and nested elements: It can reliably find the right content and buttons, even in complex layouts.
Checks its own actions to stay accurate and short steps: It uses a feedback loop to correct mistakes and make sure the task is really done right.
Uses tools efficiently and flexibly: It avoids unnecessary steps and knows how to combine tools smartly when needed.

Task:

"We have a new contact at Global Media - Jennifer Martinez (jennifer.m@globalmedia.com) is their new Senior Marketing Manager. Add her to our Salesforce and make sure she’s connected to the right company."

In this task, Minimax M2.1 was required to operate within a highly complex Salesforce interface to complete a realistic business workflow: adding a new contact, Jennifer Martinez (Senior Marketing Manager), to Global Media, and ensuring she was correctly associated with the appropriate company account. This involved navigating multiple UI layers, identifying the correct entry points, creating the contact, populating key fields, and validating the account linkage.

The results show that Minimax M2.1 executed every step accurately and without error, with no mis-clicks or workflow breakdowns. This demonstrates the model’s strong capability in understanding complex enterprise UIs, planning multi-step actions, and reliably executing end-to-end tasks—highlighting its robustness in real-world, browser-based enterprise automation scenarios.

How Minimax M2.1 Improves Task Performance

Minimax M2.1 emerges as a strong choice for autonomous enterprise agents. Built to excel in real-world complex workflows, M2.1 consistently handles long-horizon, multi-step tasks with reliability. It delivers a compelling combination of performance, efficiency, and versatility, making it a practical option for scaling agent-based automation in enterprise environments.

Enhanced Reasoning and Workflow Continuity

One of the key strengths of M2.1 lies in its systematic improvements for real-world complex tasks. Compared to its predecessor, M2.1 produces more concise and efficient reasoning chains, improved responsiveness, and reduced token consumption—resulting in smoother execution of continuous workflows such as agentic task automation.

Rather than relying on simple conversational history, Minimax M2.1 is designed for better context management across multiple steps. This enhanced structured reasoning helps maintain logical continuity during multi-step function calls and reduces the chance of errors later in the workflow, especially in browser-driven task sequences.

Agent and Tool Generalization Capabilities

M2.1 exhibits strong performance across a variety of agent scaffolding frameworks and tooling environments. It generalizes reliably with different tools and supports integrated workflows, enhancing its utility in real office and enterprise automation tasks.

Robustness in Long-Horizon Planning

Enterprise automation often involves uncertainty—handling dynamic UI states, load delays, and unexpected interactions. Through its improved reasoning and execution efficiency, Minimax M2.1 demonstrates resilience in longer task sequences, making it well suited for agentic automation systems that require stability over many steps.

While the gap between top-tier models can be small for standard queries, in scenarios where state retention, complex instruction following, and error recovery are crucial, Minimax M2.1’s enhancements provide a practical foundation for platforms like Eigent. Its ability to produce concise, efficient reasoning and maintain coherent task-level logic makes it an effective choice for complex, multi-step enterprise workflows.

Eigent is fully open-source, and we invite developers, researchers, and enterprise teams to explore, extend, and contribute:

👉 GitHub: https://github.com/eigent-ai/eigent

👉 Huggingface: https://huggingface.co/MiniMaxAI/MiniMax-M2.1

👉 Join our Discord community: https://discord.camel-ai.org