Xunxing Mao

Posted on Apr 27

AI Agent Beginner's Guide: From Concepts to Protocols

#ai #agents #architecture #productivity

This article was originally published on maoxunxing.com. Follow me there for more deep dives on AI agents and engineering practices.

Agent vs Copilot

This is the key distinction for understanding the current forms of AI tools:

	Agent (Driver)	Copilot (Co-driver)
Role	Autonomous decision-making and execution	Assists, follows instructions
Behavior	Perceive environment -> Plan -> Execute -> Achieve goal	You give instructions -> It helps complete the task
User Requirements	Define the goal and boundaries	Needs prompting skills, domain knowledge, ongoing exploration
Process	Non-deterministic, dynamic decisions	Relatively fixed, human-driven

Why We Need Agents

Models cannot answer every question. For instance, events after the training knowledge cutoff are simply unknown to the model.

The evolution path:

Pure LLM
  |
Combining APIs for answers (Compound AI System / Agentic System) -> Fixed Pipeline
  |
Agents -> LLM makes dynamic decisions based on task goals and environment; the process is non-deterministic

The key distinction: a Pipeline is a fixed workflow; an Agent is a dynamic workflow. An Agent decides what to do next in real time based on the current environment and task goals.

Agent Architecture Patterns

Single Agent

A single LLM instance that autonomously perceives the environment, plans, executes, provides feedback, and completes end-to-end tasks.

Multi-Agent

Master-Worker pattern: One master Agent handles planning and task dispatch, while multiple worker Agents execute specific subtasks
Peer collaboration pattern: All Agents can make decisions, determining who acts based on capability and context

Computer Use / Web Agent

The difference from traditional automated testing: traditional test instructions are written by humans, while Web Agent instructions are dynamically generated by the Agent. An Agent can operate browsers and desktop applications like a human, autonomously deciding the operation path based on objectives.

Protocol Layer: MCP and SGP

For AI Agents to work in practice, they need to interface with external resources and services. There are two key protocol directions:

MCP (Model Context Protocol)

The problem it solves: How to integrate local computer resources?

RPA (Robotic Process Automation)
Local documents (Local Doc)
Local software (Local Software)

MCP enables Agents to call local capabilities without routing everything through the cloud.

SGP (Standard Gateway Protocol)

The problem it solves: How to connect remote services?

API services
Remote models

SGP unifies the interaction protocol between Agents and remote services.

Implementation Strategy and Business Opportunities

Architecture-Level Opportunities Belong to Small Businesses

Data processing architecture + systems engineering + application paradigms are uncertain -- this is where small entrepreneurs have opportunities, filling gaps that models alone cannot handle. The bottom layer consists of numerous models. But model and algorithm paradigms are certain -- that is what big companies do.

A notable finding: large models distilled into small models perform worse than small models trained from scratch.

A New Paradigm for Personalized Recommendations

Implementing personalized recommendations through Agents rather than traditional search/recommendation algorithms:

Models handle knowledge and general awareness
On-device processing for privacy without cloud dependency (1B parameters, 300-500M memory)
25ms search response time; models suit asynchronous IM scenarios
Hyper-personalization: every user sees a different UI

Pick Vertical Scenarios, Don't Spray and Pray

Don't try to do everything at once. Choose vertical scenarios and go deep.

Pipeline Thinking

Don't be constrained by paradigms; instead consider what it can do and experience it firsthand:

For example: use a model to identify intent -> use a cutout tool for image segmentation -> use other tools for subsequent processing. This is a Pipeline where data + engineering + algorithms can work as a full stack. Capability doesn't need to be 100%; 50% is OK.

Text as a Service

FaaS is a service, Serverless is a service, and text can also be a service:

Push me five AI news items every day, English only and meeting certain criteria, then execute a task to publish to a public account -- this can become a service. Implement the code yourself and it becomes a service. Code is productivity.

Business Scenario Examples

Public sentiment and comments: Extracting key signals buried in massive amounts of information
Infrastructure tools: All usable; the key is the solution and its future-readiness

Practical References

fastrtc

A real-time communication framework suitable for building voice/video Agents: fastrtc.org/cookbook

References

Building Effective Agents — Anthropic — Anthropic's guide to building effective AI agents with practical patterns
Model Context Protocol (MCP) Specification — Official specification for MCP, the open protocol for AI-tool integration
LangChain Agent Documentation — Comprehensive documentation on building AI agents with LangChain framework

DEV Community