Hi there, my name is Darshan Ponikar. I’m a software engineer transitioning into a product engineer. Recently, I started exploring AI agent use cases, and in this blog, I’m sharing my learnings. These are just my rough thoughts around AI and AI agents.
This is a beginner-friendly blog, but you’re expected to have some understanding of LLMs and a bit of prompt engineering.
The blog aims to draw a general picture of what LLMs with actions look like. Feel free to share your thoughts in the comments and don’t hesitate to disagree with mine.
Rise of AI agents
This is getting more and more clear as AI agents are arising in every possible industry. Any boring task which is important but requires more effort to complete can easily be automated by AI agents.
I am curious to understand how these AI agents work and is that something different or new than LLMs? It's been 5 years since we have LLMs and if you haven’t been active in the space for a while, then you have probably missed the obvious limitation of LLMs where AI cannot take an action.
It still cannot send an email, can write code but cannot run software, can write a blog post but cannot improve SEO — and it goes on.
What LLMs are good at doing is purely text-based, but it's been also quite a while where developers trick LLMs to generate a proper JSON schema for their requirements, and it has improved a lot in recent years.
But generating JSON schema won’t help you much to perform any task which requires more than that.
Beyond prompt engineering
LLMs are generally good at instructing well. You probably have noticed this, but what if they have some sort of way to call some third-party APIs to complete a task?
For example, a way to store data in a DB API?
This is known as tool calling, and it’s evolving a lot.
The idea is that you instruct LLMs and give them a set of tools, each with a unique description. What LLMs do is they basically decide which tool and when to use it in order to fulfil the user's query.
image source
This is more like orchestration of tools, and AI has tools to make decisions in real time that basically changes everything.
Instead of giving some sort of JSON, AI can literally make an API call or pull data from a database. You just have to give them clear instructions on what this tool wants as parameters.
Tool calling on scale
I think on a scale, that’s what LLMs are going to do. They are trained on a very high-quality large dataset and they have some sort of reasoning (let’s assume they have) to make decisions.
Even though they are good at reasoning and decision-making, they don’t have a way to perform actions, and I think we are very far from that future (yet) where LLMs can perform any task.
But they are good at instructing well. They can instruct an AI agent what and how to do. That’s where LLMs are going to take actions through AI agents.
LLMs will operate robots, LLMs are operating third-party software through AI agents.
Next gen wrapper
Think AI agents are like another wrapper around AI, but this time you are providing tools along with instructions so LLMs can almost perform certain actions for a given query.
The complexity of a tool can vary based on the task it is given to perform.
It can be easier to make an API call to something more complex like ordering food online where it has to take care of things like payment, restaurant availability, distance, source to destination, etc.
Obviously, you should start with something simpler first before you make it a little complex.
In general, tools can make different API calls, can use LLMs to reason better, etc.
To make this more standardized, Anthropic came up with MCP server, a standard way to provide third-party API support to LLMs.
Think like how your client app makes an API call to a server via REST APIs metaphorically speaking, MCP server provides REST API to LLMs.
MCP server provides scale so you can make your AI agent as generic as possible based on your requirement.
Since this topic is pretty novice, we don't have many resources available as MCP servers but we are still early for this.
I think it's far easier to just provide tool calling instead of setting up your own MCP server if you just want to test it out.
But in future, we will probably see companies providing MCP servers to communicate, and there might be a startup which solely provides MCP servers for context.
Think like MCP servers for legals, MCP servers to restaurant chains, MCP servers to know your Instagram stats, etc.
But LLMs need to be smarter and efficient enough to make a decision for tool calling, otherwise it won’t make any sense.
Hence the title AI as an orchestration. We are still new, and it’s fascinating to explore endless opportunities that can be fulfilled by AI.
We have an MCP server now. Maybe in the future we will have some other tool to make it more efficient.
Thanks for reading, share your thoughts in the comments section.
Top comments (1)
Worth reading!