Gao Dalie (高達烈)

Posted on Mar 12

Manus AI + Ollama: Build & Scrape ANYTHING (First-Ever General AI Agent) = OpenManus

#programming #datascience #ai #machinelearning

Artificial intelligence technology has developed rapidly in recent years, and major AI companies are competing to launch more powerful AI agents.

Recently, a general AI agent called Manus has attracted widespread attention. Its performance is said to surpass many existing AI systems, including another well-known Chinese AI — DeepSeek & ChatGPT

Manus comes from the Latin Mens et Manus, which means mind and hand. It refers to a general artificial intelligence entity/AI Agent that connects thinking and action, which can not only think and deliver answers but also deliver results. Manus excels at handling various tasks in life and work, allowing you to let it do everything even when you are taking a break.

Unlike most AI assistants (such as ChatGPT) that only give you suggestions or answers, Manus can think, plan, and execute tasks independently and directly deliver the results to you. Use it to handle complex tasks, and it can be even more efficient than you think.

So, let me give you a quick demo of a live chatbot to show you what I mean.

I went to an alternative Manus called openManus, First, the prompt gives the travel requirements: the specific travel date, duration and departure location, travel budget and expected travel content; then the expected feedback form is given: a travel manual made in HTML is needed, covering all the above mentioned contents and the necessary suggestions for the trip.

Then Manus creates a Markdown file to record all the ToDo contents, starts to search for relevant contents step by step, and clicks on different web pages to view relevant contents.

While browsing, it will even actively complete operations such as scrolling down the page and clicking on page elements. During the reading process, it will also try to summarize what it has seen and improve the ToDo table step by step until all the contents are completed, and finally, it produces an HTML version of the travel brochure as requested

Manus Demo Case 2: Analyzing Stocks
If you take a look at the screen, you will see. Manus demonstrated his strong stock price analysis capabilities. The task is to analyze the correlation between the stock prices of Nvidia, Marvell Technology and TSMC over the past three years.

Although there is generally a strong correlation between these three stocks, it may not be easy for novice users to sort out the cause and effect relationship quickly.

Manus operates like a real stockbroker. It first accesses information platforms such as Yahoo Finance through API to obtain historical stock data and cross-verifies the accuracy of the data to avoid possible misleading information from a single data source and ensure the reliability of the final analysis results.

In this case, Manus used Python programming skills to perform data analysis and visualization and combined it with professional tools in the financial field.

Ultimately, he provided users with clear feedback on the causal relationship between these stocks through data charts and detailed analysis reports, which is like the tasks that “interns” in the financial field do in their daily work.

So, by the end of this Story, you will understand why Manus is Unique, how it works, why it really more powerful than DeepSeek & ChatGPT and how to install it locally. Let’s dig a little deeper.

Before we start! 🦸🏻‍♀️

If you like this topic and you want to support me:
Clap my article 50 times; that will really help me out.👏
Follow me on Medium and subscribe to get my latest article for Free🫶
Join the family — Subscribe to the YouTube channel

Why Manus is Unique?

The advantages of Manus are mainly reflected in the following aspects:

Autonomy: It goes beyond providing suggestions; it allows users to truly think, plan, and execute tasks independently, allowing them to achieve their goals easily.

Powerful tool calling capability: Manus can call various external tools, greatly enhancing its practicality.

Efficient execution: Whether it is a complex task or a simple daily task, Manus can complete it quickly and efficiently and deliver results.
Self-learning and optimization: Manus continuously evolves through feedback to better adapt to the needs of each user.

Performance on GAIA benchmark: Manus achieved a SOTA (State-of-the-Art) score on the GAIA benchmark, demonstrating its technical strength.
Compared with traditional AI assistants, Manus’s biggest advantage is that it doesn’t just give you ideas but actually puts them into practice and solves real problems. Not only can it do it faster, but it can also provide you with a complete solution, reducing your subsequent modification work.

How It Works (Manus)

Although specific details have not been made public regarding the technical implementation of Manus, online discussions and technical analysis suggest that its core structure may include several key components.
**
Virtual machine environment**: Manus appears to run in a virtual machine on a Linux system, where basic tools like the Chrome browser and Python are installed to support its operation.

Task Planner: Manus’s task planning capabilities may rely on a powerful task decomposition system to convert user needs into specific To-Do Lists. In speculation, some people believe that it may adopt the Claude model because Claude is good at reasoning and complex task planning.

Model Context Protocol (MCP): As a key component of the system, Manus may implement the Model Context Protocol, which allows different components to efficiently pass and manage context information, which may be the basis for seamless collaboration between the components of the Manus system.

Task Execution Scheduler: The task scheduler is responsible for scheduling the execution of tasks based on the content of the ToDo List. It may use some open-source models and interact with other components through MCP to help Manus make real-time adjustments and decisions when executing tasks.

Different types of execution agents: Manus has multiple built-in “agents” or “agents” that specialize in handling different types of tasks. These agents may share context information through MCP to achieve collaborative work. For example, it may have a dedicated web browsing agent or an agent that obtains data based on a specific API. Each agent is responsible for completing a task and feeding the results back to the system.

Task summary generator: Ultimately, Manus will have a task summary system that integrates the results of all task executions and generates a final output report. It is speculated that this summary process may also use the Claude model to integrate all input data and generate a final result that meets user needs.

How it works (OpenManus)

The core of OpenManus is a revolutionary modular Agent system that consists of a highly intelligent professional team forming a collaborative network. Manus main agent:

Project manager: who can understand user needs and coordinates the work of various professional teams.

PlanningAgent: A strategic expert who breaks down complex tasks into clear and executable steps.

ToolCallAgent: A technical expert who knows how to use a variety of powerful tools.

As a result, developers can freely combine different functional modules according to their own needs to create their own unique AI assistants.

OpenManus seamlessly integrates multiple top models, including Claude 3.5 and Qwen VL Plus, allowing developers to fully utilize the advantages of each model.

Why was it possible for the team to break Manus’s high-wall monopoly in just 3 hours?

The reason is that OpenManus actually originates from the open-source accumulation of MetaGPT.

We simply grafted the browser tool chain onto the previous code and combined it with the accumulated Agent toolkit; the core system was completed in 1 hour. Another powerful feature of OpenManus is its instant feedback mechanism.

The process of the LLM thinking chain will be presented visually. Whether it is real-time updates on task execution progress, thought process logs, instant notifications of archives, etc., they are all visible at any time.

In addition, OpenManus is equipped with a powerful tool chain that can handle various complex tasks.

Python code executor: generate and execute code on the fly, Internet search tools: automatically obtain and analyze Internet information, Browser Automation: Simulate human operations to interact with web pages,

File processing system: automatically generate and manage various files. Among them, these tools are not simple independent modules but carefully designed collaborative systems that can work together seamlessly and complete tasks efficiently.

2. OpenManus local configuration

After briefly understanding the main design of OpenManus, we will deploy it locally.

First, install the Conda tool to manage Python virtual environments.

Then, create a Python virtual environment (environment name: OpenManus), and activate the environment:

conda create -n OpenManus python=3.12
python -m venv myenv
myenv\Scripts\activate

Clone the repository

git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus

Finally, install the OpenManus dependencies:

pip install -r requirements.txt

Next, we configure the big model API. We will use QwQ-32B as the underlying big model for OpenManus.

First, copy a configuration file:config/config.toml

cp config/config.example.toml config/config.toml

Because gpt-4o costs money, I use the local ollama service. In this video, I will be using the Qwen model, but feel free to use any model you want. Please pay attention when you download the model and make sure it supports the function call

Go to your terminal and run

ollama run qwq

Once you download the model, please go to Edit config/config.tomlAdd the API key and custom settings:

# Global LLM configuration
[llm]
model = "qwq"
base_url = "http://ollamahost:11434/v1"
api_key = "sk-..."
max_tokens = 4096
temperature = 0.0

# Optional configuration for specific LLM models
[llm.vision]
model = "minicpm-v"
base_url = "http://ollamahost:11434/vi"
api_key = "sk-..."

Run OpenManus with one line of command:

python main.py

Manus AI Vs DeepSeek Vs ChatGPT

While Manus AI, Deepseek, and ChatGPT are all based on large language models, they each have distinct strengths and applications. Manus AI stands out as a general-purpose AI agent capable of autonomous task execution, making it highly versatile across various domains.

Deepseek, with its MoE architecture, excels in technical applications like code generation and reasoning, prioritizing efficiency and speed. ChatGPT, built on a Transformer architecture, is optimized for natural language interactions and excels in conversation and text generation.

Manus AI’s multi-agent architecture and unsupervised reinforcement learning system provide greater flexibility and adaptability, making it a powerful solution for real-world task execution.

Ultimately, each model serves different needs, with Manus AI focusing on action-oriented assistance and Deepseek enhancing computational efficiency.

Meanwhile, ChatGPT relies on NLP techniques and is extensively trained on a mix of biased and unbiased text from books, articles, and websites to refine its language-processing capabilities.

Conclusion :

Judging from the final results, Manus‘s results are richer, more like a report, including an introduction, table of contents, and city description of each listed city named. The content and format are also richer and easier to read, while OpenManus ‘s results are much simpler, and each company includes 3 core main information: city name, Tips and location.

It is hard to say who is better at the moment, but one thing is certain: general intelligent agents are getting closer and closer to us!

🧙‍♂️ I am an AI Generative expert! If you want to collaborate on a project, drop an inquiry here or book a 1-on-1 Consulting Call With Me.

I would highly appreciate it if you

❣ join to my Patreon: https://www.patreon.com/GaoDalie_AI

Book an Appointment with me:https://topmate.io/gaodalie_ai
Support the Content (every Dollar goes back into the video):https://buymeacoffee.com/gaodalie98d
Subscribe to the Newsletter for free:https://substack.com/@gaodalie

Join the KendoReact Free Components Challenge: $5,000 in Prizes!

From data grids to toolbars to form components and more, KendoReact offers a comprehensive suite of UI components that every React developer should experience building with. With 50+ free components available, you'll have everything you need to build an impressive application.

Get started

DEV Community