DEV Community

Cover image for What’s my AGI again? Implementing an Autonomous Agent in Js
Alessandro Annini
Alessandro Annini

Posted on • Originally published at alessandro-annini.Medium

What’s my AGI again? Implementing an Autonomous Agent in Js

[This article was first published on Medium]

I fired it up it was a coding night,
I used the key to get the AI right,
I started sending text, And it processed my lines,
But then it called a service, see?

😄

TL;DR: This agent library can use functions and call on itself until it finds a way to answer to the user. It’s not a real, complete **artificial general intelligence*, it’s really extremely *far from it actually, but it’s a nice experiment in that direction.

**Github repo for micro-agi-js — this article is based on v1.0.2**

Photo by [Lyman Hansel Gerona](https://unsplash.com/@lhgerona?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)
[Photo by Lyman Hansel Gerona on Unsplash]

Definition of AGI in 99 words:
Artificial General Intelligence (AGI) refers to machines that possess the ability to understand, learn, and apply knowledge across a wide range of tasks, akin to human intelligence. Unlike narrow AI, which excels at specific tasks, an AGI can adapt to various challenges without prior programming. It involves complex problem-solving, reasoning, and comprehension, aiming to mimic human cognitive functions. AGI’s development is a monumental task, requiring advancements in machine learning, neuroscience, and computer science. While it holds tremendous potential, including automating complex tasks and enhancing problem-solving, it also raises ethical and safety concerns that necessitate careful consideration and robust safeguards.

It is able to reason

Actually the word “reason” is (still) misleading in the field of AI but with the AGI technique the LLM can generate a chain of thought. This is important when they cannot achieve their goal in a single iteration. This gives space for the AI to understand what it needs to satisfy the request.

Every time the script calls OpenAI it uses function calling feature to get back a structured response that can be parsed from JSON to a Javascript object.

This object contains 4 standard properties:

  • thought: the main idea the LLM comes up with

  • description: an expanded version of the thought where the LLM elaborates more

  • criticism: something that could be improved or some information that is missing

  • response: (optional) the final response that will be presented to the user

The “response” property is optional because the AI could not have an answer after the iteration so the other properties (thought, description and criticism) are given to the next iteration so the AI can continue to reason from where it left.

It is able to use tools

Another extremely powerful feature is the ability to use the tools that you teach them. More specifically you could have some functions that could help the AI during the reasoning, maybe those functions are loading data from a database or maybe they hit an API to get useful data.

Tools are functions.

If you want to make those functions available to the AI you need to describe how to use them, and you can do this using OpenAI function calling feature.

The aim is to create a clear list that explains how your tools work. For each tool, mention its name and describe what it does. If the tool needs extra information (parameters) to work, list these details too, including each one’s name, what it should be like (type), a short explanation of what it does, and whether it’s essential to use the tool properly. This way, you have a straightforward guide to help use your tools correctly.

My function definitions. These will be the tools available to the agent.

As you can see in the example above I am trying to teach the LLM how my functions work giving them the most minimal but complete information set about them so that when it needs something that it doesn’t know, it looks for function that could help and then is able to tell you what to execute and how, giving you the parameters, if any.

An important concept here is that the LLM is always the brain only, and you are in charge to build the arms and teach it how to use them.

  1. YOU create the functions you need to get data or whatever

  2. YOU create an object like the example above to describe your functions to the AI

  3. YOU send the functions definitions + your prompt to the AI

  4. AI tells you which function is needed and the parameters to use

  5. YOU execute the corresponding function with the parameters

  6. a result is obtained

This OpenAI feature is really powerful as it is but it looks like magic when working in a AGI workflow!

The mission is to be able to obtain from the LLM a response that always have thought, description and criticism properties to keep alive the chain of thoughts but to the 4 main AGI properties I added a new, optional, one: **“command”. **How do I define this? Using my functions definitions.

This way the LLM can return a response for the user or can return a command to execute or maybe neither of the two because it needs an iteration to think.

Introspection

This is the typical AGI function description to have some kind of introspection about the AI thinking:

This version has “command” property removed, for clarity.

This is the part that makes possible a “reasoning” for our agent, but here, for clarity, I removed the command part.

The command property will hold what makes possible for the user to nest the definitions for their tools/functions on the previous part of the structure, the one that provides for the reasoning. This new part introduces our functions and how to use them to the agent so it can reason about if, when and how to use them.

This version has “command” property only, for clarity.

With the first image we saw how to define my functions in a way that the agent can use them passing the definitions to OpenAI API using the feature “function calling”, well, that objet is nested right here inside the command property, so OpenAI is able to decide if it needs to ask you to use a function to get some data and how you should use it in order to get the data it needs.

The Agent

The engine of the agent is fairly simple: it exposes a function able to receive a message from outside. This message is thrown into a recursive fuction with the following steps:

  1. Uses OpenAI completion feature ALWAYS using “function calling”

  2. Unpacks the response from OpenAI and check if a command is present

  3. If the response contains a **command property **it tries to execute it using the services that the user passed when the agent was created. This command and the respose of the function are then appended to the context that the agent sends to OpenAI

  4. If the response contains a response propery then the content will be appended to the context and used to respond to the user

  5. If the response does not contain any command nor response property, then the content is appended to the context and the recursive function calls itself for a new iteration

Agent working flow

Coding the agent

Let’s now see the code and unpack it step by step.

full simplified code

When I create an agent i get an object with a signle function exposed: *processMessage. *We will begin from here.

processMessage

This function gets a message from the user as input, add it to the current context as a user message and call a recursive function: *recurseUntilResponse *that, when executed for first thing calls OpenAI API to understand how to procede

fetch OpenAI response

fetchOpenAIResponse will use OpenAI completion and function calling with the functions that we have built in the first part of the article, then after having received a response, it checks if the response contains a command to execute or a response for the user.

a command is found

If a command is found it means that I now have the name of a function to call and the necessary arguments (I am talking about the nested function of the first snippet of code).

After the selected function is executed I add the result to the context and, thanks to the recursion, it will be included in the next iteration and sent to the next call to OpenAI that (hopefully) will use the result to formulate a response for the user.
If the data is still not what is needed to respond to the user, the LLM could decide to call another function during the next iteration.

a response is found

If a response is found it means that the LLM found a way to answer to the user, maybe because the context was containing enough data (thanks to previous command results included in it).

The recursion stops here, the response itself is added to the context, this time with role assistant, and sent to the user.

neither a command or a respose is found

Sometimes you will get no command or response, this happens when the data is not enough to answer to the user and maybe the LLM needs more reasoning or realize that it needs more data before it can formulate a response.

Conclusion

Photo by [Rock'n Roll Monkey](https://unsplash.com/@rocknrollmonkey?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)
[Photo by Rock'n Roll Monkey on Unsplash]

This is, in brief, the anatomy of *micro-agi-js. *The library is actually a simple, experimental implementation and many improvements can be added to it. At the same time the results are quite effective and it is often capable to use many functions at the right time, the right way. This, in my opinion enables many new, useful, paradigms in a wide range of scenarios in a way that is absolutely fascinating, technically effective and still a powerful marketing weapon.

I am already working to a new implementation that includes limits to the API calling, embeddings and the ability to have a conversation between different agents.

Try the library looking at the example and let me know what do you think about it and how would you change it.

This is me on LinkedIn, Twitter and Github.

Top comments (0)