[TIL] A Three-Hour Interview with Ji Yichao, Chief Scientist at Manus (Acquired by Meta)

#programming #tutorial #devops

January 5th, 2026

(Full Video [https://www.youtube.com/watch?v=UqMtkgQe-kI])

Foreword

This three-hour interview with Ji Yichao, Chief Scientist at Manus (later acquired by Meta), is truly a must-watch for many in the AI industry.

I'll gradually add my key takeaways in the comments. For now, here's a quick rundown of what I found particularly noteworthy and worth sharing.

As a serial AI entrepreneur, Ji Yichao shares insights from his decade of experience building AI startups. He discusses the evolution from Tokenize to LSTM to Transformer applications, and his two experiences building AI browsers.

He then delves into why Manus succeeded, highlighting how they solved problems that Cloud Providers and Model Providers couldn't – specifically, a robust tool library that allows LLMs to autonomously plan tasks.

He also likens AI Agents to manufacturing due to the extensive optimization required. Regarding Manus's product planning and direction, he believes the right approach involves "deciding what not to do."

Many technical concepts are touched upon very briefly but are incredibly deep. I'll detail them in the comments.

Regarding MCP Usage

Quick link: https://youtu.be/UqMtkgQe-kI?t=10342

Manus takes a rather conservative approach to MCP usage. The dynamic tool discovery method of MCP can pollute the Action Space, leading to a decrease in cache hit rates. A lower cache hit rate can drastically increase costs. The proposed improvement is:

MCP invocation methods outside the native Action Space.

He also mentions that this was discussed in an Anthropic blog post: Code execution with MCP: Building more efficient agents. This article explains how to use a code execution environment to improve the efficiency of connecting AI agents with external systems via MCP (Model Context Protocol). MCP is an open standard designed to address the challenges of connecting AI agents to tools and data. The article points out that with the widespread adoption of MCP, direct tool invocation can lead to context window overload and intermediate results consuming excessive tokens. By presenting the MCP server as a code API, agents can manage context more effectively, reduce token usage, and improve efficiency.

Key Points:

MCP Challenges:
- Tool definitions overload the context window.
- Intermediate tool results consume extra tokens.
Advantages of Code Execution:
- Agents can load tools on demand and process data within the execution environment.
- Reduces token usage, lowering costs and latency.
- Offers benefits in privacy protection and state management.
Implementation of Code Execution:
- Uses TypeScript to generate a file tree of available tools.
- Agents explore tools via the file system, loading only necessary definitions.
Other Benefits of Code Execution:
- Performs data filtering and transformation.
- Uses familiar code patterns for control flow.
- Enables privacy-preserving operations.
- Allows for state persistence and skill saving.

Solution

Code Execution Environment:
- Treats the MCP server as a code API.
- Agents run code within the execution environment, reducing the burden on the context window.

Considerations

Security and Infrastructure Requirements:
- Requires a secure execution environment with appropriate sandboxing, resource limits, and monitoring.
- These infrastructure needs add operational overhead and security considerations.

Mention of OpenAI's 5-Level AGENT Hierarchy

https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf

Level 1: Conversational AI/Chatbots
Level 2: Human-Level Problem Solving/Reasoners
Level 3: Agents
Level 4: Innovators
Level 5: Organizers

from https://youtu.be/UqMtkgQe-kI?t=9650

Papers Influencing AI Progress

When asked about papers that have influenced the progress of AI:

FLAN-T5 (Scaling Instruction-Finetuned Language Models [https://arxiv.org/abs/2210.11416]):
- Achieved impressive results with an 11B FLAN-T5 through parameter fine-tuning.
Word2Vec famous paper (Efficient Estimation of Word Representations in Vector Space [https://arxiv.org/abs/1301.3781]):
- The Word2vec algorithm, through its efficient mathematical architecture, enabled computers to convert text into semantically related vectors, officially ushering in the golden age of deep learning in Natural Language Processing.

Discussion Related to Entrepreneurship

Additionally, there's a discussion highly relevant to entrepreneurship: https://youtu.be/UqMtkgQe-kI?t=3783

When a group of not-so-dumb people have nothing to do, great ideas emerge.
“For every complex problem there is an answer that is clear, simple, and wrong.” - H. L. Mencken