DEV Community

Xunxing Mao
Xunxing Mao

Posted on

AI Agent Beginner's Guide: From Concepts to Protocols

This article was originally published on maoxunxing.com. Follow me there for more deep dives on AI agents and engineering practices.


Agent vs Copilot

This is the key distinction for understanding the current forms of AI tools:

Agent (Driver) Copilot (Co-driver)
Role Autonomous decision-making and execution Assists, follows instructions
Behavior Perceive environment -> Plan -> Execute -> Achieve goal You give instructions -> It helps complete the task
User Requirements Define the goal and boundaries Needs prompting skills, domain knowledge, ongoing exploration
Process Non-deterministic, dynamic decisions Relatively fixed, human-driven

Why We Need Agents

Models cannot answer every question. For instance, events after the training knowledge cutoff are simply unknown to the model.

The evolution path:

Pure LLM
  |
Combining APIs for answers (Compound AI System / Agentic System) -> Fixed Pipeline
  |
Agents -> LLM makes dynamic decisions based on task goals and environment; the process is non-deterministic
Enter fullscreen mode Exit fullscreen mode

The key distinction: a Pipeline is a fixed workflow; an Agent is a dynamic workflow. An Agent decides what to do next in real time based on the current environment and task goals.


Agent Architecture Patterns

Single Agent

A single LLM instance that autonomously perceives the environment, plans, executes, provides feedback, and completes end-to-end tasks.

Multi-Agent

  • Master-Worker pattern: One master Agent handles planning and task dispatch, while multiple worker Agents execute specific subtasks
  • Peer collaboration pattern: All Agents can make decisions, determining who acts based on capability and context

Computer Use / Web Agent

The difference from traditional automated testing: traditional test instructions are written by humans, while Web Agent instructions are dynamically generated by the Agent. An Agent can operate browsers and desktop applications like a human, autonomously deciding the operation path based on objectives.


Protocol Layer: MCP and SGP

For AI Agents to work in practice, they need to interface with external resources and services. There are two key protocol directions:

MCP (Model Context Protocol)

The problem it solves: How to integrate local computer resources?

  • RPA (Robotic Process Automation)
  • Local documents (Local Doc)
  • Local software (Local Software)

MCP enables Agents to call local capabilities without routing everything through the cloud.

SGP (Standard Gateway Protocol)

The problem it solves: How to connect remote services?

  • API services
  • Remote models

SGP unifies the interaction protocol between Agents and remote services.


Implementation Strategy and Business Opportunities

Architecture-Level Opportunities Belong to Small Businesses

Data processing architecture + systems engineering + application paradigms are uncertain -- this is where small entrepreneurs have opportunities, filling gaps that models alone cannot handle. The bottom layer consists of numerous models. But model and algorithm paradigms are certain -- that is what big companies do.

A notable finding: large models distilled into small models perform worse than small models trained from scratch.

A New Paradigm for Personalized Recommendations

Implementing personalized recommendations through Agents rather than traditional search/recommendation algorithms:

  • Models handle knowledge and general awareness
  • On-device processing for privacy without cloud dependency (1B parameters, 300-500M memory)
  • 25ms search response time; models suit asynchronous IM scenarios
  • Hyper-personalization: every user sees a different UI

Pick Vertical Scenarios, Don't Spray and Pray

Don't try to do everything at once. Choose vertical scenarios and go deep.

Pipeline Thinking

Don't be constrained by paradigms; instead consider what it can do and experience it firsthand:

For example: use a model to identify intent -> use a cutout tool for image segmentation -> use other tools for subsequent processing. This is a Pipeline where data + engineering + algorithms can work as a full stack. Capability doesn't need to be 100%; 50% is OK.

Text as a Service

FaaS is a service, Serverless is a service, and text can also be a service:

Push me five AI news items every day, English only and meeting certain criteria, then execute a task to publish to a public account -- this can become a service. Implement the code yourself and it becomes a service. Code is productivity.

Business Scenario Examples

  • Public sentiment and comments: Extracting key signals buried in massive amounts of information
  • Infrastructure tools: All usable; the key is the solution and its future-readiness

Practical References

fastrtc

A real-time communication framework suitable for building voice/video Agents: fastrtc.org/cookbook


References


Related Reading

If you want a practical AI coding workflow, check out my
AI Coding Playbook: Tool Selection, Workflows, and Prompt Templates.

If you're exploring team-level impact after personal productivity gains, read
AI Is Rewriting the Playbook.


Felix Mao (毛毛星) | maoxunxing.com | @maoxunxing | GitHub

Top comments (0)