DEV Community

Cover image for Building a Language Companion AI Agent
Migsar Navarro
Migsar Navarro

Posted on

Building a Language Companion AI Agent

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections

For the last few weeks, I have spent considerable time learning about Google's ADK (Agent Development Kit) and putting together a capstone project for the AI agents course. I decided it was the right time to deep dive into AI and really commit to this project. It was a great time, not only because I found the time to work on some ideas I have had for a long time, but also because I feel I finally was able to catch up with this technology that has been evolving at a ludicrous speed.

Coming from a front-end background (I do a lot of back-end these days, but that's another story), my first thought was to take the time to explore AI agents' user interface and user experience (UI & UX). In particular, I wanted to experiment with finding a way to use different, more specialized interfaces to accomplish specific tasks, without the feeling of having to switch context.

I didn't have a clear idea of what exactly I was going to build, but I am really proud of what I accomplished, an agent that fits into another topic I am currently very passionate about, which is language education.

A language companion

Before this capstone project, I had frequently used AI, but it was totally from the user perspective, with just a few brief incursions into development. I was not happy with the results in either area; it was the usual mediocre but tolerable result, enough to keep going, but not enough to catch your attention.

By now, I had a clear idea about the project I wanted to work on, a problem I've been trying to solve with AI for a few months now: language learning. I started learning Mandarin a few years ago, and I've found AI to be a great companion. One of the drawbacks is that I lose a huge amount of time moving information around, from the conversation to the editor with constant jumps to the browser to look up videos or search for more information. I thought it would be great to have everything in the same place.

I had a hard time getting started. Kaggle didn't help me this time, as a software engineer, it didn't feel right to code in a notebook. I have been using Kaggle for learning, and it is great, but this time was different because I needed to code in it. After a while, it started slowing me down and making it hard to think about the code and all the moving parts.

With Kaggle's notebooks, things usually work out of the box, which is very cool, but they don't scale the way a properly structured project does. The Python in the learning material didn't look so much like the Python in the ADK examples; it seems it is just a matter of presentation, but there are some important runtime changes. Also, there have been quite a few changes in Python, and not being a full-time Python developer, I was not sure what tools to use. I learnt about uvicorn and typed Python for this project. The examples didn't just work on my machine. Ironically, I started by creating a Dockerfile for the project in order to have a clear view of the dev environment and no pollution from outside, which also meant keeping my computer clean.

After spending the first few days struggling with the setup, trying to get it perfect, and not even touching the core of my project, I realized I had to change strategy if I wanted to finish my project in time. So I moved from working on getting the development environment right to getting the prompts right. My first attempt was to use a single agent to get some kind of MVP working, but the quality was not good enough. The reasoning was just fine, but then the agent had some sort of internal conflict with the separation of concerns. So I decided to create sub-agents:

  • One for inspecting the user's prompt goal
  • One for getting the meaning
  • One for getting the vocabulary
  • Another one for finding common language patterns
  • And, finally, one to summarize the work of the previous agents.

AI agents are inherently curious, and a small forgotten comment in a function description or some comment in a definition made the agent perform completely differently. For example, at some point, I had something like "English language" not even in the description of the agent but in the docstring for the agent file, so the agent systematically refused to accept sentences that were not in English, because it was intended for the English language, or so it said.

It required a lot of effort to get this last sub-agent right. At the beginning, it didn't properly use the information of the other agents. I solved that problem, and then the root agent was trying to make the summary itself without using the summary the agent had just generated. Other times, it didn't call the sub-agent at all because it considered itself able to summarize the text. At some point, I wrote a conditional sentence that made the agent summarize the summary and totally ignored all the work done by the previous sub-agents.

There are two main flows for my agent:

  1. Asking to explain a phrase.
  2. Asking to explain a topic.

Those may sound similar, but think about the following. When you start with a topic, it is easy to come up with example phrases, but when you start with a phrase, it is not so easy to get back to the initial topic that originated it; there are so many topics that could have been the source of a given sentence.

Now I had the information I wanted, but I still hadn't touched anything related to the interface yet. For a moment, I thought it would be great to have session, state, and long-term memory, and in the course notebooks, it seemed so easy, so I had to give it a try. It was hard to figure out how to configure things like the runner or the session when using the ADK, and I didn't want to sidetrack to that topic, so I had to skip memory and context, at least for the prototype.

I spend a good amount of time exploring the possibilities of a human-in-the-loop pattern and the examples. That was the only mention of such a thing in the course, but it was not at all like the interface change I had in mind. I was thinking if it could be possible to create an interface with this concept, which is formally named elicitation, so I thought the safest way was to use an MCP server to deal with external calls in a coordinated and grouped manner. So I started creating an MCP server.

FastMCP made the task of creating the MCP server incredibly easy. The only pain point was not being able to deploy it to their official cloud service. For now, I am running it locally with the streaming HTTP interface.

It worked like a charm! The ADK picked up the context perfectly and generated the MCP request from the conversation. A page for the quiz was created, and the link was provided so the user could jump into the quiz. Once the user finishes the quiz, they can ask the agent to evaluate it. The answers are saved in an external database for persistence and to allow sharing them. Afterwards, the quiz is mostly graded by the AI agent, which means it is possible to assess open questions, something almost impossible without LLMs.

I am really proud of the result. There is so much more work to do, but I loved the way ADK and FastMCP helped me create most of the boilerplate. I enjoyed coding the agent, and I will continue working on it to make sure it is production-grade sometime soon.

There you have it: an AI-powered language tutor that includes creating quizzes that live outside of the main conversation and are really interactive. The main idea is to have a two-way communication from the conversational AI agent to the quizzes platform and back, the last part will remain as an exercise to the reader... ehem, sorry, as a goal for another time.

The course

The course was structured in a really intuitive way, but I still struggled with one particular aspect: The workflow for using the ADK to build the agents. Kaggle notebooks are an incredible research setup, but they are very different from the environment in which the agent will run in production. Here are some of the questions I found along the way:

  • How to structure an ADK project?
  • How to run ADK in Python, or go?
  • What does the app runner mean? Can I have it in my main project file?
  • How to configure things in a repository (ex. session, state, and memory)?
  • How to efficiently test things? Not to evaluate the agent performance, but to work in a section of a multi-agent architecture.

At the beginning, I was trying to build my project as I went along with the course, but it turned out to be impractical, too much information, too little time, and so many unexpected problems. The way I understand agents changed a lot through the course. Initially, I had no clear idea of the building blocks and the possibilities, but I also didn't know how much I knew because of my particular skills and experience. I thought it would be easier, not in a magical sense, but I thought ADK would take care of more things, not that I think it is not necessary, on the contrary, it helps a lot, and taking care of more stuff would mean a trade-off in terms of flexibility.

After the course and the capstone project, my priority list would be as follows:

  • Understanding agents (flows, agent vs tool, inter-agent communication)
  • How to code so as not to block yourself in the future? If you have a good perspective, you can start small and grow incrementally without having to restructure the whole project.
    • How to think: bottom-up, top-down? How to isolate agents for development?
    • MCP, A2A.
  • Interfaces. In my opinion, it is the most important understated aspect.
  • Memory and Session. It is foundational, but you cannot get it right if you don't solve the other problems first.
  • Testing. How to keep a prompt library and implement testing so you don't waste time. Evaluation to have a way to compare previous and current performance.
  • Deployment. It depends a bit on the structure of the project and the intended interface, but it is frustrating to spend much time building something and discover it is not suited to be deployed as intended.

AI is much more than LLMs

I was completely unaware that such a vibrant and diverse community existed around artificial intelligence. I had started creating some AI-powered software before, and then abandoned the project after a while, feeling a bit lost and disappointed.

This AI agents course was great because it opened a new perspective on how all the pieces of this ecosystem fit together. I think it is worth mentioning that there are many different startups doing many different but complementary things. From the outside, only the big names are heard, but for every detail of the workflow, at least a few different companies are building great products. As my idea evolved, while I was searching for tutorials and ways to make things fit together, I started learning about some incredibly specialized details of the AI ecosystem.

AI is amazing; you really have the knowledge of an expert in any field at your fingertips, but at the same time, you have the naivete of an 8-year-old kid doing her best to prove that she is old enough to do a real job. So, with AI, you need to be more patient and iterate a lot, way more than with usual engineering, even if you think you really have to iterate a lot for usual engineering work.

I think there is a real shift, and something I've not seen mentioned that often is that AI was built by engineers but designed by managers; it is not for the excellent engineer who needs to be more productive, it is, mostly, for managers who can achieve a higher throughput through coordination of several human or digital agents. That's not to start a discussion on whether it is good or bad, just to say that we really need to understand what AI can do for us in practical terms, and that often comes as an improvement through management.

I am convinced it is an exciting time in an emerging area, and there are so many new things to come.

Afterthoughts

At the end, I realized creating agents doesn't simplify building tools at all; it just makes it a different kind of job, for a totally different profile. It may be more enjoyable, because it transforms the tasks from a form we struggle to deal with into a form we are more comfortable using. It makes complex tasks simpler; we just have to ask for them.

It is mentioned that anyone can build with AI. It is true to some extent, since the interface is simplified compared to having to learn artificial languages to express your simpler thoughts, at the same time is also false, because most people have no management skills at all, and the marketing makes them believe it is just talking, while it is not; complex projects remain complex even if expressed as a conversation.

Creating an agent is about thinking like the leader of a team and coordinating the rest of the team to do their job most purposefully and efficiently. If you don't know what you want to achieve, it is easier than ever to get lost because now not only can other humans take the leadership away from you, but machines can too. It is way too easy to sidetrack.

Here are my two main takeaways from the course:

First, management is more important than ever, not less. We often focus on the technical part of state-of-the-art technologies, and end up neglecting the essential part that is managing, not just managing people, but managing time, managing resources. AI is often portrayed as a magical technology that can help us do everything we want, but it is not; it is as good as the prompts we can create for it and the uses we decide to give it. We can create a prompt creator and a task manager agent, but to create a good prompt creator or task manager, we need to go back to step one.

Second, for complex problems, AI agents won't save you time. But don't despair; it can help a lot by transforming the job from one type of job in which humans are not very capable to another one in which we are really good at, which is expressing and sharing our thoughts. Building a good agent takes a lot of time and a systematic engineering approach; the resulting agent will only be as good as the team that constructs it. It is a good idea to use AI to build AI, but it won't save much time; it will just make the task more manageable.

A final remark I would like to make. While it is true that human beings' skills are limited, we do have a word or two about the way we shape our society. Thoughts are continuously being re-shaped, and we should care that new generations discover the joy of science, engineering, and many other topics that are currently labeled as difficult or boring. AI agents are incredible indeed, and it's precisely because of that that it is paramount to know how to use and coordinate them with mastery.

Top comments (0)