DEV Community: Akshat Jain

The Era of Business SuperIntelligence

Akshat Jain — Sat, 28 Sep 2024 06:47:12 +0000

The Evolution of Business: From Manual to SuperIntelligence

Historically, business has been defined by its core activities and purpose. Investopedia succinctly describes it as “an organization or enterprising entity engaged in commercial, industrial, or professional activities.” In essence, business is about activities. But activities can be done for other purposes as well – for leisure, or charity. Then how is business different? Business has a purpose – a common goal that needs to be achieved over a period of time and the progress towards that goal needs to be tracked. That’s where data comes in.

For most of the 20th century, businesses relied heavily on human capital to fulfill three primary components: activities, data management, and decision-making. The 1970s saw accountants meticulously maintaining books, while customer relationships were nurtured through physical rolodexes. This era, though seemingly simplistic by today’s standards, laid the groundwork for what was to come.

The advent of computers and software in the late 20th century ushered in a period of digital transformation. Software became the new “System of Record,” enhancing data persistence and accessibility. The shift to cloud computing further amplified these benefits, making data storage and sharing more efficient than ever before.

Post-2015, we witnessed the rise of machine learning in real-world applications. This period marked the birth of data-driven decision-making, with recommendation systems, computer vision, and natural language processing taking center stage. The term “Business Intelligence” gained prominence, signifying a new era where data began to influence strategic priorities.

Fast forward to the present time, AI Agents have now become a reality. These agents have the agency to make decisions and perform certain activities. Does this mean humans are completely unnecessary when running a business? Not at all. Let’s understand why. A large portion of human time goes into repetitive activities that require less complex decision-making. AI will handle these tasks for humans, which will allow humans to be more strategic and more productive than ever.

Synergies between Four Pillars of Business SuperIntelligence

Data: So far businesses have relied mostly on structured data. Not because, structured data was easily available, but mainly because it was easier to record, maintain, and process this type of data. With AI, the nature of data itself will shift from structured to unstructured and from single modality (text) to multimodal (text, videos, audio, etc.). As a result, the data sources, the data capturing & processing pipeline, and the decision-making processes will also need to evolve.
Software: It might be difficult to imagine how software evolves because of AI. Building software will be easy, agreed – but the end output – will that be different from the current software we use? The software which we have built so far was never built for intelligence. It was always made for deterministic inputs and outputs. We never tried to handle the known unknowns with Software because that was simply not feasible. While the old age software was heavily guardrailed, the new age software will be more flexible and ambitious. The early signs of this can be seen in consumer support agents. The old stack which involved hundreds of manually designed rules is getting replaced by the new stack which simply consumes a knowledge base and handles complex scenarios effortlessly. The fewer rules (code) you write, the more generic your agent is. Tactically speaking, the new age software will be built for generic capabilities and it will also be interactive based on real-time context.
AI: We are all aware of how LLMs are becoming better at reasoning, multimodality, and handling longer contexts. But we are not discussing technical evolution here. Our goal is to understand that what additional roles will AI get better at. Till now, AI has been helping us in specific parts of the value chain like extracting insights from data, generating content, transcribing meetings, etc. But going forward, AI will be able to connect the dots and will start covering a larger portion of the value chain. Let’s take a specific case, say sales. AI SDRs are now able to automatically mine leads, conduct deep research, and generate personalized outreaches. But upon close observation, you’ll notice that this is still one end of the sales value chain. In futuristic systems, AI SDRs will be intimately coupled with your software stack (CRM, Support App, etc.). This will allow them to handle objections, book demos, conduct demos, and close deals. But the evolution doesn’t stop here. With enough usage, AI SDRs will be able to customize the GTM strategy in real-time for every lead. They will predict the right set of actions and assign these actions to either themselves or human AEs. So, the role of AI will gradually expand from execution to strategy.
Humans: We just saw how AI will be playing a role not just in the execution of mundane tasks but also in strategic decisions. Does that mean that the role of humans will become limited in the future? No! As AI takes over some of the backend actions and decisions, humans will be able to focus on higher-touch, higher-value, and more strategic outcomes. We have already seen some hard skills like calculation, memorization, etc. becoming less relevant due to the advent of software. AI will make some other skills obsolete. For example, due to AI, we no longer need to have a good command of grammar and vocabulary to write a good article. AI takes care of these aspects, we only need to have good ideas and a good taste for distinguishing extraordinary work from ordinary content. Just as great directors are not necessarily great actors, humans will also need not be good at execution but should be good at envisioning things and providing feedback.

Downstream Implications of SuperIntelligence

Outcome-based Monetization

Superintelligence is not about intelligent insights or actions, it has to be tethered to outcomes. If you’re not able to deliver outcomes, then probably your insights and actions are not smart enough. If you’ve customers who are paying you for outcomes, then most probably your monetization is directly tied to their growth. This direct linkage was missing in the SaaS products because subscription-based monetization is always constrained by time. Let’s understand this with an example, if a B2B customer is super successful it will hire more people in sales, which will increase the no of seats in the CRM, but this growth will have some associated inertia because hiring and onboarding is a slow process. Whereas, if the same B2B customer is using AI SDR, the business growth is no longer constrained by human capital, and the time to convert capital into labor is reduced by orders of magnitude. This sets the ground for exponential growth.

Quality over Quantity

When Business SuperIntelligence starts owning a larger share of the value chain, then it can charge higher ACVs than it SaaS counterparts. AI-powered labour will lead to exponential growth of existing customers which implies higher Expansion ARR. If the top 10% of your users drive all the monetization, then it’s the quality of users (and their ACV) that matters, not just the scale of users. Business Applications may start looking more like some Consumer Applications such as Free-to-play games which often have “whale monetization” mechanics where the top users can spend thousands of dollars if they want.

Intelligence based Costs

Today, we think of LLM costs in terms of no. of tokens consumed. However, what we often ignore is the fact that intelligence is already baked into the pricing. That’s why GPT 4 is priced higher than GPT 3.5. The counterintuitive insight here is that there are cases where GPT 4 will be cheaper than GPT 3.5 because it is an intelligent model that can probably solve a given task with less no. of tokens. Recently, OpenAI released its new family of models called o1. It seems to “think” before responding, highlighting that not all tokens are created equal. We’re moving towards a paradigm shift—away from traditional token-based billing and towards intelligence consumed.

The Road Ahead

As we embrace Business Superintelligence, companies must reimagine operations, valuing both human and AI strengths. Successful businesses will thrive by combining human expertise with AI, setting new standards for productivity and innovation.

Welcome to the age of Business Superintelligence—where the impossible becomes routine, and the future arrives faster than we ever imagined.

Link to original article: https://superagi.com/the-era-of-business-superintelligence/

CRM 3.0: Reimagining Customer Relationships Using AI Agents

Akshat Jain — Thu, 08 Aug 2024 09:23:01 +0000

The Software as a Service (SaaS) model began when Salesforce launched its “End of Software” campaign in 2000 and went public only four years later. At the time, people were still buying software in a box. CRM software like Siebel was incredibly difficult to run and required constant costly upgrades. Salesforce was different. Its product was a website. “No Software” was a value prop even non-technical people understood. CRM 2.0 marked a platform shift that brought new technologies (like the cloud) and new business models (like SaaS).

Today, we’re witnessing another platform shift to AI. We are ushering into an era of cognitive intelligence which means Software will no longer be a passive tool that needs to be operated by a human. The next generation of software will be active, like a living entity, not only working for humans but also guiding humans with AI-generated next-best actions.

While incumbents often adapt to new platform shifts, they rarely completely rethink their architecture. For the last 20 years, companies like Salesforce and HubSpot have survived because they are embedded as “Systems of Record,” meaning replacing them is a herculean task that sales leaders avoid. However, AI is fundamentally reimagining the core system of record and sales workflows to such an extent that the ROI in switching to a new AI-native stack is absolutely justified.

Three pillars of the new AI-native Sales Stack

Multi-modality: The foundation of CRM 2.0 was based on structured representation of sales opportunities, using relational databases. With LLMs, the core of the CRM 3.0 will be entirely unstructured and multimodal, including text, image, voice, and video. A company’s sales platform could include data about existing and prospective customers from countless sources: recordings and transcripts from any conversation with someone at the company, emails and Slack messages, sales enablement materials, product usage, customer support activity, public news, financial reports…the list is endless.
Hyper-Personalization: a16z’s General Partner Andrew Chen says, we only have marketing because 1:1 sales for everything is too expensive. Few things in business are more strategic than personalization because it makes your offerings relevant to customers. Need personalized marketing collateral for a deal? Your AI assistant can produce the assets you need and give you live tips during calls to help you close.
Always-On Intelligence: With AI, sales teams won’t need to spend endless hours researching new leads or prepping for calls — AI will do it in seconds. Reps won’t need to gauge the readiness of potential customers because AI will automatically compile a ranked list of primed buyers and keep it updated. The next-gen system-of-record will constantly ingest data to create the most up-to-date context. In essence, the way sellers and buyers interact will be fundamentally different.

Sales Workflows Redefined

With AI-native foundations, common sales activities may be redefined or even disappear completely. At the same time, we’ll likely see new sales workflows that aren’t possible today.

Sales Stack re-imagined for AI-native workflows

Now that we have established that AI offers endless ways in which sales processes can be augmented as well as disrupted, the question arises: what will the new-age Sales Stack look like? Will it be radically different from current solutions or simply incremental features on top of the current stack? In our view, the ideal solution will combine the reasoning and acting capabilities of AI with the visibility and controls of existing software.

Just like hybrid cars, the best solution will offer the benefits of futuristic tech (high performance and efficiency of an electric powertrain) without the downsides of new technology (limited range of EVs). That is why we are working on a stealth product that provides the familiarity of incumbent CRMs but is built on a modern platform with actions as a core component for seamless collaboration between Sales professionals and AI Agents.

To visualize the full extent of how AI Agents can be applied to the sales process, we broke it down into its constituent steps.

There are three broad functions where AI agents can leverage their reasoning and action capabilities. Let’s analyze each stage to understand how AI Agents can create magic by interacting with an AI-native CRM!

Pipeline Generation
LLMs have enough reasoning capabilities to power AI Agents which get you Sales Qualified Leads (SQLs). The four steps involved here are: Understanding your ICP, Gathering leads basis the ICP criteria, Enriching those leads with external and internal data sources, and tracking the online activities of these leads to predict their Sales Readiness.
Prospecting
AI Agents can conduct deep research for every SQL and create personalized outreach messages with relevant ice-breakers. They can follow up with prospects via multiple channels (email, LinkedIn, etc.) and generate sales collateral on-demand if prospects request something specific. This is the stage where the reasoning capability of AI Agents becomes a real game-changer.

In prospecting conversations, one needs to be extra mindful as one wrong reply can burn the lead. Here, the AI Agent is not only responsible for responding to user queries reactively but also needs to drive the conversation actively by asking smart questions. The agent needs to understand the key pain points of the prospect and their propensity to pay before pitching the right product and asking to schedule a demo.

Post-Demo Actions Note-taking and transcription tools aren’t as novel as they were in 2020. However, extracting the right action items and insights from meetings can significantly affect conversion probability. Another big reason that makes this step critical is the fact it completes the sales loop which helps in improving the efficacy of AI agents during the first two stages as well. For example, if AI Agents don’t have access to all the customer interactions, then they can make rookie mistakes like adding leads who are already in your pipeline or reaching out to potential customers too early or too late. We all know that sales professionals have as much love for data entry as developers have for documentation. With this in mind, this step is extra crucial to maintain the CRM in a state that allows AI agents to seamlessly do their jobs.

The birth of new dynamic organizations

So far, we’ve covered why AI won’t just augment existing workflows but will also replace and eliminate them. The only question is when. The main thing holding up operational transformation is people’s readiness to embrace the changes AI brings. AI doesn’t need a 6-month change management plan. Changes are instant and beneficial.

Fortunately, money isn’t scared of AI. C-level execs will push for readiness, and first movers will be rewarded with increased market share, just like in the early days of Search Engine Marketing and Social Media. AI adoption will not only boost organizational productivity but will also create new types of organizations. Let’s understand how!

Sales, marketing, and customer success will blend together

Today, sales, marketing, and customer success teams often feel siloed, with poor knowledge sharing and rough handoff processes. In the new age, all important customer contexts will be reflected in the same source of truth. As most activities are guided by AI, job functions could start to blend together. Sales, account management, and customer success may simply be seen as different ways of adding a human touch to go-to-market strategies. No more fighting over who gets credit for upsells — you could even imagine a world where quotas are redesigned to be more team-based than individual-based, accurately reflecting the opportunity for fluid collaboration throughout the sales cycle.

This collaboration will not be limited to employees but will also extend to the software stack. In the AI-native world, your marketing campaigns would likely be handled by Marketing AI agents integrated into your CRM. There may not be a separate tool for marketing – it could all be powered by the same CRM.

Fluid GTM Strategies

Today, companies typically decide where to focus resources based on target segments and annual contract value ranges — for example, a top-down sales motion or an inside sales-assist motion. They often hire and build teams around a prescribed strategy. The assumptions around these economics will look very different in an AI-first world. Companies may be able to reorient their resource allocation around what’s best for the customer — to close this account, what’s the best go-to-market approach? Today, many companies choose to deliberately characterize themselves as either enterprise-grade or developer-first; in the future, companies should be able to cater to both of these buyer personas with highly customized sales journeys.

Per seat pricing → outcome-based pricing

In this tweet thread, Alex Rampell breaks down the customer support cost into two buckets: Human and Software. The human-only cost per ticket is around $37.50, while the software cost per ticket is only $0.69. AI-based products could unlock instant ROI as they can replace humans altogether which is 98% of the cost. Under the new AI-first paradigm, Companies will be forced to switch to outcome-based pricing rather than seat-based pricing. This will also be welcomed by their buyers.

Conclusion

In the AI-native era, the triumvirate of Tech Stack, Workflows, and Business Models will be re-imagined all at once. The massive platform shift will uproot every incumbent that we see today, especially the large ones as they are the slowest to adapt. At the same time, the new-age Sales Stack must provide continuity in terms of UX so that businesses can undergo AI transformation smoothly. That’s why at SuperAGI, we are re-imaging the core software primitives from AI-native lens. Stay tuned for more updates!

Checkout the Original Article here

Multi-Agent System

Akshat Jain — Mon, 03 Jun 2024 03:28:55 +0000

All of us have heard about the Mixture-of-Experts (MoE) architecture for LLMs. MoE divides models into separate sub-networks (or “experts”), each specializing in a subset of the input data, to jointly perform a task. A mixture of Expert architectures enables large-scale models, even those comprising many billions of parameters, to greatly reduce computation costs during pre-training and achieve faster performance during inference time. Broadly speaking, it achieves this efficiency through selectively activating only the specific experts needed for a given task, rather than activating the entire neural network for every task.

What if we adopt the principles of MoE on the agent level? Agents, like LLMs, become hard to scale as we add multiple responsibilities to them. This simple yet ground-breaking insight led us to develop the world’s first Multi-Agent System (MAS) which is also deployed in production. But we dive deeper, let’s start with the basics of Single-Agent Architecture and how it can be extended to MAS.

Agent – The fundamental building block

An agent, in the context of Large Language Models (LLMs), is a system that uses an LLM as the fundamental computational component to construct a plan with appropriate reasoning to tackle any challenge using the tools and resources at its disposal. It is similar to a human, who, given a problem, will devise a strategy and solve the problem utilizing the tools required to tackle the problem. The LLM acts similarly to the human brain in the Agent. For any given task, one will not solve it as such. The ideal way is to break it down into one or more smaller tasks that can be done sequentially or independently to solve the task. An agent will also do the same thing. The LLM will plan out the way it intends to solve the task. To accomplish any of the intermediate steps, the plan usually calls for the use of one or more tools available to the agent. Apart from the LLM and the tools the agent will also have other components for its proper intended functioning.

Broadly, there are three main components of an agent:

A prompt
Memory for the Agent
The Tools

The prompt will define the way the system is going to behave and work. It will define the set of goals the agent must achieve, while also having the constraints it must follow to achieve these goals. Think of the prompt as the blueprint for our multi-agent system. It’s like the master plan that outlines what each agent needs to achieve and how they should go about doing it. Without this guidance, agents would lack direction and might wander aimlessly. So, the prompt essentially serves as the compass that keeps our system on course, ensuring that all agents are working towards common objectives within a defined framework. This prompt is also the major bottleneck in increasing the complexity of a single agent. To build complex systems, we divide the responsibilities between multiple agents so that the prompt of every agent remains simple.

Memory is the backbone of our LLM agents. It acts like their personal archive of knowledge and experiences. Similar to how humans draw from past experiences to make decisions, LLM agents utilize their memory to understand context, learn from past interactions, and make informed choices. Memory can simply be just passing the conversation history back to the LLM, or it can even be passing the extracted semantic information from the conversation and giving it to the LLM.

Tools are the Swiss Army knives of our agents, providing them with specialized capabilities to tackle various tasks effectively. These tools can be APIs, executable functions, or other services that help agents finish their tasks.

Now that we have understood the basic components of an Agent, let’s see how these components work together in a single-agent system.

Single-Agent System

A single-agent system consists of one particular AI agent that is equipped with multiple tools at its disposal to achieve any given problem. These systems are designed to handle tasks autonomously, leveraging the combined capabilities of the tools along with the reasoning capability of the LLM. The agent will devise a step-by-step plan that is to be followed to achieve the user goal. Once the plan is formulated, the agent will use the required tools to complete each of the available steps. Once each steps are completed, the outputs that were achieved at each stage can be clubbed together to get the final output.

There are different ways a particular user goal can be achieved. The plan that the LLMs will come up with depends on the availability of tools, its overall goal, and the constraints that it has to follow. The prompt, that controls the behavior of the agent should be therefore crafted in such a way that it works in the way we want it to work, and will be utilizing the resources efficiently to achieve the goals.

Architecture:

Why are Single-Agent Systems still relevant?

There are a few advantages to going with a Single-Agent system architecture. Firstly, simplicity, with just one agent handling all tasks, the system becomes easier to design, implement, and manage. The overhead of organizing communication between multiple agents will not be there.

Single-agent systems often boast greater coherence and consistency in decision-making. With a single agent in control, there’s no possibility of conflicting goals or actions among multiple agents. This can result in more predictable and stable behavior, making it easier to understand and debug the system.

Single-agent systems are typically more suitable for tasks that don’t require complex coordination. In fields and areas where centralized decision-making is required, a single-agent system will be quite efficient and will perform well in achieving the user goal.

Limitations of Single-Agent System

Single agents are often designed with a narrow focus, which can limit their ability to handle tasks outside their immediate domain. This limitation can pose challenges in environments where tasks are diverse or rapidly changing. Their narrow focus can hinder their ability to handle tasks beyond their immediate domain.

Scaling a single agent for more extensive or varied tasks often requires substantial redesign. When faced with the need to handle a broader range of tasks or increased complexity, simply adding more capabilities to a single agent may not be sufficient. Furthermore, scaling a single agent may introduce performance bottlenecks or efficiency issues.

Single-agent systems are also limited by memory constraints and processing capabilities. Since all tasks and responsibilities are concentrated within a single agent, it must contend with the finite resources available to it, including memory and processing power.

The shift towards Multi-Agent System Architecture

The exploration of Single-Agent Systems has highlighted significant limitations, particularly in handling complex, dynamic tasks and scalability issues. This sets the stage for the introduction of Multi-Agent Systems (MAS), which offer a robust framework capable of overcoming these challenges. In MAS, architecture, there are multiple independent agents who all work together to solve complex tasks.

In MAS, individual agents have their own responsibilities, characterized by their prompts and tools. Unlike single-agent systems, where one agent is responsible for all tasks, MAS allows for specialization and collaboration among several agents. This approach not only enhances efficiency but also improves the system’s ability to handle more complex and varied tasks.

Adding more agents to the system can extend its capabilities without the need for significant redesign. When faced with increasing demands or expanding task domains, incorporating additional agents offers a scalable solution that can accommodate growth seamlessly. Unlike single-agent systems, where scaling often requires substantial modifications to the existing architecture, multi-agent systems can adapt more readily to changing requirements by simply adding new agents with specialized capabilities. The redundancy inherent in multi-agent systems provides built-in fault tolerance and resilience. If one or more agents malfunction, the system can still perform the right intended work as the rest of the agents would come to a mutual agreement.

Architecture:

Concept and Structure of Multi-Agent Systems

Multi-Agent system consist of multiple intelligent agents, each capable of performing tasks autonomously but designed to work collaboratively toward a common goal. The structure of MAS allows for distributed problem-solving and decision-making, which significantly enhances the system’s overall efficiency and effectiveness. Each agent in a MAS can specialize in different tasks or aspects of a problem, bringing a diverse set of skills and perspectives to the table. Unlike single-agent systems, control in MAS is distributed among multiple agents, which reduces bottlenecks and single points of failure. Agents in a MAS can communicate and coordinate with each other, sharing information and decisions to optimize outcomes. The system is inherently modular, allowing for the addition, removal, or modification of agents without disrupting the entire system.

Agent to Agent Communication Protocol (AACP)

In a Multi-Agent system, the Agent-to-Agent Communication Protocol (AACP) is designed to facilitate structured and efficient communication among agents, pivotal for achieving consensus and addressing complex problems collaboratively. This protocol is instrumental in enhancing the overall system performance by leveraging the diverse insights and capabilities of individual agents, each characterized by a unique persona responding to system prompts.

The AACP adopts a dual-faceted communication architecture:

Hierarchical Communication Flow: This structure allows for the dissemination of information across different levels of the system hierarchy, enabling superior agents to coordinate and direct the actions of subordinate agents efficiently.
Lateral Communication: Agents situated at the same hierarchical level possess the capability to engage in direct communication. This feature is essential for collaborative problem-solving and task execution, facilitating rapid information exchange and coordination among peers.

The reconfigurability of the communication flow, tailored to the specific requirements of the task at hand, underscores the flexibility and adaptiveness of the AACP.

Analysis of single and multi-agent systems

When comparing single-agent and multi-agent systems, several key differences emerge:

Scalability: MAS are inherently more scalable than single-agent systems due to their distributed nature. They can handle more complex tasks by dividing the workload among multiple agents.
Robustness and Reliability: Multi-agent systems are generally more robust and reliable. The failure of one agent does not cripple the system, and others can take over or redistribute the tasks.
Flexibility and Adaptability: MAS can adapt to changes in the environment or task requirements more effectively. They can reconfigure themselves, with agents taking on new roles as needed.

Two Design Choices for MAS

In this section, we will highlight two possible design patterns for MAS. But before we delve into the differences between these two patterns, let’s highlight their commonalities. The core premise of MAS is that the observation from the environment will be passed to multiple experts and different experts will recommend different actions. Then there will be an aggregation layer where these recommendations will analyzed and some of them will be approved. Now the two flavors discussed below differ on just one parameter – do we consult all the experts or do we selectively invoke relevant experts only?

Routing-Based Multi-Agent System

In a routing-based MAS, the orchestration acts as a routing layer. Depending on the message sent by the user, it will decide the agents it needs to invoke. The invoked agents will interact among themselves regarding the message and will decide on what is the best action to take and communicate it effectively to the user. By routing, the orchestrator will be solely responsible for identifying the right agents and communicating the same to the user. The main drawback of this is that the routing layer will tend to become the single point of failure. If the router decides not to invoke an agent that is required, or if it invokes some other agent that is not needed, there might be discrepancies in the communication to the user.

Broadcast-Based Multi-Agent System

The Broadcast-Based MAS architecture represents a generalized evolution of the Routing-Based MAS, effectively eliminating the Routing Layer that acts as a potential single point of failure. In instances where the Routing Layer mismanages the user query, the system’s integrity may be compromised. To enhance robustness and circumvent this vulnerability, the Routing Layer is omitted, permitting the free dissemination of information to all corresponding agents in unison. This way all the agents will get the input, and they will decide whether or not to give their output for the aggregation. This way it will ensure that there is no communication mishap between any of the agents.

Why do we believe the Multi-Agent System is a fundamental breakthrough?

The collaborative nature of multi-agent systems brings several benefits, especially in complex and dynamic environments

Enhanced Problem-Solving Capabilities: By leveraging the diverse capabilities of various agents, MAS can tackle complex problems more effectively than single-agent systems.
Increased Efficiency: Collaboration among agents often leads to more efficient use of resources, as tasks are allocated based on the specialization of each agent.
Resilience to Uncertainty and Change: Multi-agent systems are better equipped to handle uncertainty and changes in the environment, as they can quickly reorganize and adapt.

Besides, the above-mentioned benefits, Multi-Agents systems have a strong resemblance to systems that have stood the test of time. For example, the hierarchal organizational structure that powers some of the largest organizations in the world is a lot similar to multi-agent systems. Even the human body is a composition of multiple organ systems. These resemblances instill our confidence in that MAS is going to be an enduring concept in the evolution journey of agents.

Autonomous Software Development is here!

Akshat Jain — Fri, 31 May 2024 05:43:16 +0000

Introduction

The year is 2030. The latest company to get listed on NASDAQ has just 2 employees. There is a CEO, and a CTO and they are supported by a team of over a thousand agents. That seems quite dystopian right? Agreed. Whether AI Agents are going to augment human productivity (and hence the ROI on hiring human employees) or they are going to replace the human workforce – we don’t know yet! However, one thing is quite clear: the ratio of AI agents to human employees in any organization is going to increase by at least 100x. With that being said, let’s talk about what functions are going to become the beachheads for these AI Agents (or you can even call them AI Colleagues).

In 2022, we saw companies like Jasper emerge in the Content Writing space. Then in 2023, we saw some companies breaking out in the Enterprise Search space. In 2024, as the LLMs became more powerful, we are now seeing a bunch of companies scaling into the AI Sales Agent space. But what is the common thread between these three spaces? All of these tasks are “self-contained”. For example, one search query is a one-off task where the task output is generally independent of the broader context in which the user is making that search query. A good trick for checking if a task is “self-contained” or not is to ask yourself – “Can I outsource this to an intern?” This question also makes an indirect claim that AI Agents that are deployed today are not good enough replacements for full-time human employees. But what is the difference between a full-time employee and an intern? It’s mostly the organizational context. Interns don’t have context.

2025 will be the year when AI Agents expand beyond “self-contained” tasks and can drive large projects end-to-end. However, to keep the discussion more focused, let’s talk about just one function – software development. It is a huge market – much larger than just what we’ve historically seen in developer tooling. Tools like Github Copilot significantly enhance the productivity of the developers. But why haven’t we seen a tremendous increase in the throughput of software consultancy firms? Because developers are still present in the value chain of building software. If we draw a crude parallel with Ford’s Assembly Line, then adding the GitHub Co-pilot is not reducing the no. of stations in the assembly line, it’s merely making every station more efficient. The end-to-end automation development wave that is going to hit us in the near future will be akin to a huge 3D printer capable of producing cars at a much faster rate than traditional assembly lines.

Now that we have some imagery to describe the quantum of impact which GenAI is going to bring with end-to-end autonomous development, it’s time to dive deeper. In this blog, we will try to answer three questions:

Why is autonomous development unsolved yet? What are the major challenges?
What are the different approaches for autonomous development?
What are their pros and cons? And, which approach seems more promising?

Challenges of end-to-end autonomous development

We started this blog with the concept of “self-contained” tasks. Today, SOTA models like GPT-4 are quite adept at solving “self-contained” coding tasks. Let’s understand with some examples:

Github Copilot was launched back in 2021 and the initial offering was mostly around generating pieces of code. Partial code generation requires relatively little context because you do not have to think about how the entire project is organized and how different pieces will interact with each other. You just have to solve a very small isolated problem that the user has asked you to do.
Then came a lot of point solutions like writing test cases, generating documentation, etc. While code generation was too upstream, testing and documentation are too downstream! Did you notice the pattern? Gen-AI-based solutions were able to make an early impact in peripheral areas of software development. These peripheral tasks are “self-contained”!

Challenge 1: Having the right context

That brings us to the first challenge of E2E autonomous development: Understanding the whole project – or, Having the complete context. The word “context” may trigger a quick and dirty solution in your mind – LLMs with large enough context size to ingest the entire codebase in a single call. There are multiple loopholes in this solution:

Larger Context Windows lead to a drop in performance: In a paper from Stanford titled “Lost in the Middle,” it is shown that state-of-the-art LLMs often encounter difficulties in extracting valuable information from their context windows, especially when the information is buried inside the middle portion of the context.

2.The problem of unwanted imitation: When using the model to generate code, typically we want the most correct code that the LM is capable of producing, rather than code that reflects the most likely continuation of the previous code, which may include bugs.

A big part of the context exists outside the codebase – in Jira Issues, PRDs, etc.

Therefore, this problem needs a smarter solution than feeding in the raw codebase in its entirety. One solution which is getting more acceptance is feeding in the repository map which is a distilled representation of the entire repository. A sample repository map looks like this:

As we see, we are trying to capture essential details only rather than copying the entire codebase. This solves for all the first two loopholes mentioned above:

A smaller context window is sufficient as we are not sharing the entire code.
Since we are not sharing the entire code, the existing bugs are also not getting fed into the LLMs.

Better documentation can solve the third loophole. If we are creating more descriptive documentation within the code itself, then we can feed the documentation along with the repository map which LLMs can use to understand the product from user perspective.

Challenge 2: Personalized code generation and Inverse Scaling

LLMs are great at suggesting the most acceptable solutions because that solution was more frequently observed in the training dataset. But sometimes, the most acceptable solution may not be the correct solution. For example, in a world where APIs keep getting depreciated regularly, LLMs are at a natural disadvantage because they have been trained more on the depreciated APIs than the latest APIs. See the following screenshot where GPT-4o was asked for a simple script to determine the row height of a string with newlines in PIL (an image library in Python):

The problem with this output is that draw.textsize() is deprecated in the latest version of PIL and it got replaced with draw.textlength(). But one can understand why most LLMs will use the deprecated function.

Another reason for the most acceptable solution not being the correct solution could be the coding style. Maybe your organization believes in a less popular design paradigm. How can you ensure that the LLM follows that design paradigm? I’ve heard customers tell me ‘I wish a company could fine-tune their model securely on my codebase’. While tuning a model to your code base might make sense in theory, in reality, there is a catch: once you tune the model it becomes static unless you are doing continuous pre-training (which is costly).

One feasible solution to overcome this problem is RAG (Retrieval Augmented Generation): where we retrieve the relevant code snippets from our codebase, which can influence the generated output. Of course, RAG brings the typical challenges associated with retrieval. One must rank the code snippets based on three parameters: Relevance, Recency, and Importance. ‘Recency’ can solve the code depreciation issue and ‘Importance’ can give higher weight to frequently observed design patterns in your codebase. But getting the right balance between these three dimensions is going to be tricky. Even if we master the retrieval, there is no guarantee that the LLM will use the retrieved code snippets while drafting the response. One may think that this is less of an issue if we are using larger models. However, in reality, it’s quite the opposite and this phenomenon is called Inverse Scaling.

LLMs are trained on huge datasets and hence they are less likely to learn newer environments that are different from the normal setup. Due to inverse scaling, larger models are more likely to ignore the new information in the context and rely more on the data that they have seen earlier.

Challenge 3: The Last Mile Problem

Around mid-2023, we started seeing some of the cool demos of agents who can do end-to-end development. The Supercoder from SuperAGI and GPT-Engineer were some of the early players in this area. However, most of these projects were trying to create a new application from scratch. In other words, they were solving for the first mile. Creating a new application looks cooler than making incremental changes, but the latter is the more frequent use case and hence it has a larger market size. These projects were aimed at creating from scratch because it is an easier use case, as the LLMs are free to choose their own stack and design paradigms. However, in 2024, we are seeing startups that are more focused on making changes/additions to existing applications. The reason this is happening so late is that it was simply not possible for LLMs to understand a huge codebase and build on top of it. But today, we have smart models with large enough context lengths, and we also have techniques like RAG and repository maps to aid in code generation in existing projects. But all of the above still doesn’t guarantee that the incremental changes made to the codebase will work on the first try.

An interesting idea (reiterated by Andrej Karpathy here) is around the concept of flow engineering, which goes past single-prompt or chain-of-thought prompt and focuses on the iterative generation and testing of code.

Flow engineering will need a feedback loop from code execution. To illustrate this let’s consider the same code depreciation example, mentioned in the previous section. When the AI agent tries to execute the code generated by GPT-4o, it may work if the environment has an older version of PIL. Otherwise, there will be an output in the terminal saying that draw.textsize() is deprecated. Now, the LLM will come up with a workaround as shown in the image below.

However, the LLM is still not suggesting the ideal solution which uses the function draw.textlength(). This is the inverse scaling problem which we covered in the previous section. But if the agent had access to the web, then it can do a google search which will lead it to a stackoverflow page where it will get the right alternative for the deprecated function. This shows how a closed ecosystem can create a feedback loop with the help of tools like terminal, browser and IDE. We call this reinforcement learning from agentic feedback. An agent can leverage this feedback loop to create projects from scratch as well as to make incremental changes in existing projects.

So far, we’ve highlighted three major challenges and their possible mitigations. Notice that our ideal solution is taking the shape of agentic system which uses tools like terminal, IDE and browser along with some core components like RAG and Repository maps. Are agents the enduring answer to end-to-end autonomous development or do we need something more – maybe a code-specific LLM? Let’s explore some alternative approaches in the next section.

Are agents the enduring solution to Autonomous Development? Or do we need code-specific models?

Theoretically, using a code-specific model makes sense, especially when we are trying to optimize for production-ready solutions which must have low latency. When the domain is limited, smaller models have always been able to match the performance of general models. The reason why very few players are going for this approach is two-fold:

Empirically, we have seen successful apps like Cursor and Devin which are built on top of generic GPT models, not code-specific models.
Training a new model is a capital intensive task. The core question here is whether a new team can out-pace frontier model improvements. The base model space is moving so fast that if you do try to go deep on a code-specific model, you are at risk of a better base model coming into existence and leapfrogging you before your new model is done training.

Apart from efficiency and latency, what can code-specific models bring to table? It has to be quality, otherwise there is no case for using code-specific models as generic models will keep getting efficient with time. But what if we can ensure quality by using better techniques than training a code-specific model? That is exactly what we at SuperAGI are doing with SuperCoder 2.0. We are taking a very opinionated approach to software development where we are building coding agents which are optimized for opinionated development stacks. For example, today most of the frontend-engineering is happening in Javascript (another popular choice for frontend is flutter). But within Javascript, there are multiple libraries which used to be popular once but are no longer relevant. One such example is AngularJS which came from Google. Currently only three stacks (Vue, React, and Next) are popular. To match human level output, we are building deeper integrations with popular stacks. For example, for backend projects we are not supporting all stacks. If you want your backend written in python, we will build it using FastAPI which is the most popular python stack nowadays.

Conclusion

But a lot of founders believe that to build a long-term moat in the code generation, your agentic framework must be powered by your own code-specific models. We agree with this thought process. The only little caveat is from the strategy side. Is it the right time to invest in building your own model? Probably not, because the generic models are getting better at a good enough pace. However, this pace is slowing down. GPT-4-turbo which was released this April is not significantly better than the GPT-4 preview which came almost a year back. So, maybe the time to train your own model is coming. However, the bottomline is that startups must extract advantages from low-hanging fruits like Flow-engineering because they are more capital efficient ways to deliver value to the end-user. When one has exploited all the low-hanging fruits then it makes sense to train Large Coding Models (LCMs). We will be soon publishing a blog on LCMs, so stay tuned!

Please refer the original article here

Towards AGI: [Part 2] Multiverse of Actions

Akshat Jain — Mon, 26 Feb 2024 17:09:55 +0000

In part-1 of Towards AGI series, we discussed a core component of Agents – Memory. However, the early agent architectures, didn’t have Memory as a first class primitive. As we add new primitives to the ideal agent architecture, we must also adapt the architecture to interact with these components. Here is a take on how Agent Action Space evolves with new incremental additions to Agent Architecture.

The evolution of Agentic Actions

Phase 1: External Actions

The SayCan (Ahn et al., 2022) paper, introduced the fundamental capability of actions. They called it Grounding which is the process of connecting the natural language and abstract knowledge to the internal representation of the real world. In simpler terms, Grounding is about interacting with external world.

It’s easy to assume that all actions involve interacting with the external world. This assumption was the major limitation of SayCan agents. They could perform various actions (551, to be exact, such as “find the apple” or “go to the table”), but all these actions were of the same type: Grounding. However, if we examine human behavior, we’ll notice that many of our actions are internal, including thinking, memorization, recall, and reflection. This insight inspired the next generation of agents, which are equipped with the reasoning capabilities of LLMs.

Phase 2: Internal Actions

The next significant advancement was the ReAct paper (Yao et al., 2022b), introducing a new type of action: Reasoning. The action space of ReAct included two kinds of actions:

External actions, such as Grounding
Internal actions, like Reasoning ReAct, standing for Reasoning + Action, seems to imply that the term ‘Action’ refers solely to Grounding, and Reasoning is not an Action. However, the next generation of agents introduced more internal actions, acknowledging Reasoning as a type of action.

Phase 3: The Four Fundamental Agentic Actions

The latest generation of agent architectures had Long-Term-Memory (LTM) as a first class module. To take full-advantage of LTM they introduced two new actions:

Retrieval (Reading from Memory)
Learning (Writing into Memory) Both these actions are internal, because they do not interact with the external world.

Here is a simple diagram which captures the evolution of Agentic Action-Space.

Phase 4: Composite Agentic Actions

Now, one might wonder whether there are more action types that will be added to the Action Space.

The answer is both “Yes” and “No”. “No”, because most systems, whether human or computer, only have these four fundamental action types. However, “Yes”, because these fundamental action types can be combined to create multiple Composite Action Types. For instance, planning is a composite action type that can be implemented by combining two fundamental actions – reasoning and retrieval.

Planning as a Composite Action
In the Generative Agents paper, the agents try to imitate human behavior. Every interaction with the external world is getting logged into the memory stream. When an agent has to plan its day, it retrieves the past events from its memory (Retrieval) and then calls an LLM (Reasoning) to create the plan. Thus, Planning is a higher-order action that leverages two fundamental actions – Reasoning and Retrieval. Now, the question arises – What are the implications of identifying Planning as a new Action type?

Implications of adding new Actions in Action-Space

Every time we add a new type of Action in the Action-Space, the execution flow of the agent needs to be modified. To illustrate this, let’s compare two agent designs – a ReAct Agent and a Planner Agent.

Here is the logic of a vanilla ReAct agent:

Compare the above diagram with the following diagram which represents the logic of a Planner based agent:

Notice how the decision-making process, also known as agent design, changes when we incorporate new components, such as planning. Similarly, adding tools to interact with long-term memory will also alter the agent design. Conversely, the inclusion of another tool like a Google search tool won’t modify the agent design, as it’s just another tool for Grounding. In conclusion, if the addition of any tool or capability results in changes to the execution flow, we are most probably introducing new Fundamental or Composite Action types.

Parallel Actions

In the ReAct design, typically only one tool is called at a time. This means when a new event happens, you cannot execute multiple actions (say learning and grounding).

If we draw inspiration from humans, we will realise that humans often take actions in parallel. Whenever we see new information, we use it to decide our next step (reasoning), we remember any similar instance from the past (retrieval), we save this new experience for future reference (learning) and we also interact with the external world (grounding).

Interestingly, OpenAI’s function calling has added support for parallel function calling. MemGPT is one of the few agent frameworks which is trying to leverage parallel function calling in its agent design. It’s also among the earliest frameworks to support long-term-memory. For those interested in delving deeper into memory-related actions (Learning and Retrieval), MemGPT serves as an excellent starting point.

Conclusion & Next Steps

In this post, we examined various types of actions within the Agentic Action Space and the effects of adding new action types on decision-making procedures. In the next installment of the Towards AGI series, we will delve deeper, using actual code to explore innovations in decision-making procedures.

Towards AGI: [Part 1] Agents with Memory

Akshat Jain — Mon, 26 Feb 2024 17:00:29 +0000

Agents are an emerging class of artificial intelligence (AI) systems that use large language models (LLMs) to interact with the world. In the ‘Towards AGI’ series, we aim to explore the future of Agents. However, before we delve into the future, let’s first revisit the past.

Here is a brief diagram which captures the evolution journey of agents:

Prompt chaining is not a new concept. It’s the underlying technology behind traditional agents or chatbots, which often require handcrafted rules. This made adapting to new environments challenging. However, modern agents can access a suite of tools, depending on the user’s input and demands. These agents utilize the common-sense priors present in LLMs to adapt to novel tasks and answer user queries in environments where pre-determined chains don’t exist.

Despite the hype around Agents in 2023, they are yet to become a part of our day-to-day lives. But, mark my words, “2024 is going to go down in history as the year of Agents”. Why am I so confident? It’s because Agents have already made the leap from concept to reality. Currently, we are in the phase of scaling from the initial stage to widespread use.

With the launch of GPT-3.5, LLMs became powerful enough for decision-making (choosing actions from a given action space) which is the core capability of Agents. Now, we are making progress on two fronts:

The core reasoning capabilities are advancing due to improved models
The agent designs are becoming more refined and ready for production with the introduction of new foundational building blocks beyond just decision-making.

In this article, we will focus on one such building block – Memory (also known as State).

Memory is about adding state to stateless systems

LLMs in their current form are stateless which means that LLMs do not retain information about the user’s previous interaction. However, agents store the previous interactions in variables and use it in subsequent LLM calls. Therefore, agents are stateful and they have memory. But the catch is that most agents follow a simple design pattern where the entire history from the previous interaction is passed to the LLM during the next call. This simplistic design pattern ensures that there is no information loss but it also has multiple limitations:

Memory is limited by the context window size of the LLM
Context pollution deteriorates the output quality of the LLM
No ability to synthesize deeper insights on top of raw observations

Thus, agents need to judiciously use the context window for better performance, and at the same time, they need a larger context window to store information in a lossless way. This trade-off can be resolved by dividing the memory into two parts:

1. Short-Term Memory (STM): It is the Main Context that is fed to the LLM in runtime. It has a limited size which depends on the context window length of the LLM.
2. Long-Term Memory (LTM): It is the External Context which is stored on disk. Before every LLM call the agent retrieves relevant information from LTM and uses it to edit the STM. And after LLM call the agent writes relevant information from the output into the LTM.
This solution is similar to how any modern computer manages its memory. This analogy is better captured in the following table:

Agents	Computers
Short-Term memory	Context Window
Long-Term memory	VectorDB, GraphDB, RelationalDB, Files and Folders

Deep dive into various types of Agent Memory

STM: Working memory (LLM Context): It is a data structure with multiple parts which are usually represented with a prompt template and relevant variables. Before runtime, the STM is synthesized by replacing the relevant variables in the prompt template with information retrieved from the LTM. It includes
Perceptual inputs: Observation (aka Grounding) from previous tool calls
Active knowledge: generated by reasoning or retrieved from long-term memory
Other core information carried over from the previous decision cycle (e.g., agent’s active goals).
LTM type 1: Episodic memory (aka Raw memory): It stores the ground truth of all the actions, their outputs (observations), and the reasoning (thought) behind those actions. During the planning stage of a decision cycle, these episodes may be retrieved into working memory to support reasoning. It can be stored in relational DBs and files. It can consist of:
Input-Output pairs of tools called by the agent during the current run
History event flows (see Memory Stream in Generative Agents paper)
Game trajectories from previous episodes
LTM type 2: Semantic memory (aka Reflections): It stores an agent’s knowledge about the world and itself. Semantic memory is usually initialized from an external database for knowledge support. But it can also be learned by deriving insights from raw observations (see Reflections in Generative Agents paper). Some examples could be
RAG: Leveraging game manuals and facts as a semantic memory to affect the policy or using internal docs like HR Policy to answer questions.
RAG-based In-context Learning: A vectorDB stores recipes for doing some known tasks which can be used as in-context examples while solving newer tasks (Example: Langchain’s Extending the SQL toolkit).
Self-learning from user inputs: For example, MemGPT uses archival storage to store facts, experiences, preferences of the user, etc.
Self-learning from environment interactions: Reflexion (Shinn et al., 2023) uses an LLM to reflect on failed episodes and stores the results (e.g., “there is no dishwasher in the kitchen”).
LTM type 3: Procedural memory: This memory represents the agent’s procedures for thinking, acting, decision-making, etc. Unlike episodic or semantic memory that may be initially empty or even absent, procedural memory must be initialized by the designer with proper code to bootstrap the agent. It is of two types
implicit knowledge stored in the LLM weights
explicit knowledge is written in the agent’s code: It can be further divided into 2 types:
procedures that implement actions (reasoning, retrieval, grounding, and learning)
procedures that implement decision-making itself

Choosing the right Memory design in Production

Since agents are powered by LLMs, they are inherently probabilistic. Therefore, keeping the Agent design as simple as possible is necessary. For instance, if an application is simple enough, the agent should be able to perform without any Semantic Memory component. In this section, let’s try to see how the memory design depends on some sample end-use cases.

Use-case 1: Role-play with the user as a friend or assistant

This is one of the most popular use cases. MemGPT is a great example of this. The main KPI of agents in this setup would be to remember facts that the user reveals during normal conversation. So, you’ll need an Episodic memory component to store all the conversations. But you’ll also need a Semantic memory component to store the details and preferences of the user which will populated by extracting insights from raw conversations.

Use-case 2: Interacting with the user for customer support

In this setup, let’s assume that in the backend there are humans who are trying to resolve certain customer queries. However, the responsibility for communicating with the user is delegated to an AI agent. In a professional setup like this, the agent is responsible for extracting tasks from conversations and passing them to an employee and once the employee completes the task the agent will convey the output to the user. This use case is quite transactional. Hence, the Semantic memory does not need to focus on user preferences. Instead, it should solve for maintaining a task list and updating the status of the task as human support agents do the job.

Use-case 3: Task-execution expert that interacts with tools

Of the three use cases, this is the most transactional one. AutoGPT and SuperAGI are the ideal examples of this category. Here the agents are given a goal and they need to achieve it by calling some tools. In this case, the episodic memory will not be about storing chat history with the user, but instead, it will store the tool call history (inputs and outputs). The semantic memory could consist of VectorDB storing recipes for doing certain basic tasks. Any new task will probably be some combination of basic tasks. So we can use RAG to find the top n basic tasks that are relevant to solving the current task. Then use the solutions of those basic tasks as in-context examples to solve the current task.

Conclusion & Next Steps

In this blog, we saw that design choices for Memory depend on the end use case. But we haven’t gone deeper into how we interact with memory. There are nuances to reading from the memory (Retrieval) and writing into the memory (Learning). In fact, Retrieval and Learning are just two types of actions in the whole action-space of agents. So stay tuned for the next articles, in which we will go deeper into Action-Space (Retrieval, Learning, Reasoning, and Grounding).

Introduction to RAGA – Retrieval Augmented Generation and Actions

Akshat Jain — Mon, 09 Oct 2023 10:39:25 +0000

Retrieval-augmented generation (RAG) has made significant strides in enhancing the Language Model’s (LLM) ability to provide contextual and informed responses by leveraging external knowledge bases. The process chiefly involves an indexing stage to prepare a knowledge base, and a querying stage to fetch relevant context to assist the LLM in answering queries.

The inception of RAGA (Retrieval-Augmented Generation with Actions) augments this existing architecture by incorporating an action-taking step, thereby not just stopping at generating responses but proceeding to execute actions based on the generated information. This is a transformative step towards making AI systems more interactive and autonomous.

Understanding the RAGA Architecture

Expanding on the RAG framework, RAGA adds a critical third stage to the pipeline:

1. Indexing Stage: Similar to RAG, this stage involves preparing a knowledge base using data connectors to ingest data from various sources into a Document representation.

2. Querying Stage: This stage fetches the relevant context from the knowledge base to assist the LLM in synthesizing a response to a user query. The LlamaIndex facilitates the retrieval and indexing of data, ensuring accurate and expressive retrieval operations.

3. Action Stage: This is the new addition to RAGA. Post the generation of responses, this stage is responsible for taking appropriate actions based on the insights derived from the generated responses.

Action Determination: Based on the generated response, the system determines the action that needs to be taken. This could be defined through predefined rules or learned through reinforcement learning techniques over time.
Action Execution: Once the action is determined, RAGA executes it. This could range from sending a notification, adjusting a setting in a system, interacting with other software or hardware components, to even making decisions that affect a broader workflow.
Feedback Loop: Post-action, the feedback, if any, is collected to refine the action-determination process. This loop helps in improving the accuracy and relevance of actions over time.

Exploring a use-case: Sending highly personalized emails

Let’s illustrate how RAGA can be applied to a use case where an email marketer needs to send out highly personalized emails to a list of people.

Step 1: Retrieval

Data Collection: RAGA begins by collecting relevant data for personalizing emails. This data can include recipient profiles, historical interactions, preferences, and any other relevant information.
Knowledge Base Preparation: The collected data is organized into a knowledge base, using data connectors and indexing tools similar to those in the RAG framework.
User Query: The user specifies the goal, such as “Send personalized emails to this contact list.”

Step 2: Querying

Context Retrieval: RAGA retrieves context from the knowledge base to assist LLM in personalizing the emails. It fetches information like recipient names, past interactions, recent activities, preferences, etc..
Query Formation: The system generates queries to retrieve the relevant data from the knowledge base, e.g., “Retrieve recent interactions with John Doe” or “Retrieve preferences of Mary Smith.”
Response Generation: Using the retrieved context, RAGA generates personalized email content for each recipient, incorporating their name, recent interactions, and preferences. It may also craft subject lines and email bodies tailored to each individual.

Step 3: Action

Action Identification: The LLM identifies the action to be taken, which is to send out personalized emails to the respective recipients.
Action Formulation: The LLM converts the generated email content into machine-readable email templates.
Communication with Email Service: The LLM communicates with an email service or client, filling in the templates with recipient-specific details and sending the emails.
Feedback Collection: After sending the emails, the system collects feedback, such as delivery notifications or recipient responses, to evaluate the success of the action.

The addition of the action stage in the RAGA architecture opens up a lot of possibilities, from automating personalized email campaigns to providing seamless customer support via AI Agents. It reduces human intervention, streamlines processes, and learns from its actions, making AI systems more efficient and adaptable for real-world applications.

If you’re a developer building out LLM-powered applications and just discovered RAGA, here’s a quick guide to selecting the best method amongst RAG, RAGA, and Fine-tuning based on your application’s use case and other key metrics.

Source: https://superagi.com/introduction-to-raga-retrieval-augmented-generation-and-actions/

Understanding Knowledge Embeddings in SuperAGI

Akshat Jain — Fri, 15 Sep 2023 06:09:09 +0000

The quality of output generated by AI agents is limited by LLM constraints such as information cutoff & lack of quality data for niche tasks, which can fail to deliver context-rich, domain-specific outputs. However, by integrating knowledge embeddings, AI agents can enhance their depth and accuracy, ensuring that their responses are not only factually correct but also rich in contextual nuance.

Knowledge embeddings are vectorized data from file sources like Docs, PDFs, CSV, etc. stored as vector representations within a multi-dimensional space. By encapsulating semantic relationships and patterns in the data, these embeddings are converted into dense vectors, making them amenable for processing by machine learning models.

When integrated with autonomous agents powered by LLMs, knowledge embeddings can significantly enhance the agent’s ability to reason, provide accurate responses, and generate more contextually relevant content by grounding the model’s response and outputs in structured, factual knowledge. This synergy can lead to more informed and reliable autonomous AI agents. Let’s see how SuperAGI users can use knowledge embeddings to improve agent outputs.

Understanding Knowledge Embeddings with an Example

Consider an AI agent tasked with managing social media campaigns. By integrating an ‘SEO keyword knowledge embedding’, the agent can refine the quality of the campaign content. Instead of only automating posts, the agent can now generate content around important SEO keywords. This results in more relevant, high-quality content that drives better engagement.

Integrating Knowledge Embeddings into SuperAGI

Knowledge Embeddings can be used in SuperAGI by plugging it into an agent workflow, providing the agent with contextual knowledge to operate with an understanding of their tasks. Through user-configured knowledge, agents can access this group of information, optimizing their efficiency and output quality.

How can agents access Knowledge Embeddings?

Agents can access the Knowledge Embeddings with the help of the “KnowledgeSearch” Tool by running a semantic search query, unlike traditional keyword-based search systems.

When presented with a query, it shifts through the embeddings, identifying and returning information that has the highest semantic similarity to the input query. This ensures that the data retrieved is not only relevant but also contextually aligned with the agent’s objectives. Currently, SuperAGI is compatible with OpenAI’s text-embedding-ada-002 model.

Integrating Custom Knowledge

Users can integrate their custom knowledge with SuperAGI as follows:

Database Configuration: Users initiate by setting their vector database and index URL within SuperAGI.
Knowledge Integration: Options are available to either procure knowledge embeddings from the marketplace or incorporate external knowledge sources.
Agent Setup: Creation of an AI agent, the ‘KnowledgeSearch Tool’ and the relevant ‘Knowledge Embedding’ are selected for optimal performance.

Creating & Hosting Custom Knowledge Embeddings
SuperAGI allows users to install pre-existing embeddings from the marketplace or integrate their custom knowledge embeddings hosted on Pincecone, Qdrant, or Weaviate. Here’s how you can host and integrate your custom knowledge embeddings via all 3 of these vector DBs:

Qdrant Vector DB:
Access your Qdrant Account
Navigate to the dashboard, create a new cluster, or access existing clusters.
Get your api-key, and URL to create a client instance.
To create your index, run the following code with your Qdrant Credentials.
Once your index is created, go to Vector Settings in SuperAGI by clicking the settings icon on the top right corner.
In the Vector Database Settings, select Qdrant.
Add your vector database settings and click connect. This will connect your Qdrant index.

Pinecone:
Login to your account on Pinecone.
You can create an index as follows or can use an existing index
Input the index name and add 1536 in dimensions. For knowledge embeddings, we use OpenAI’s text-embedding-ada-002 model which creates embeddings of 1536 dimensions.
After the index is created, go to Vector Settings in SuperAGI by clicking the settings icon on the top right corner.
In the Vector Database Settings, select Pinecone
To connect Pinecone, add the API Key, environment, and index name.
Go to the Pinecone dashboard and click Indexes to get the index name
Go to the Pinecone dashboard and click API keys to get the API key and environment.
Add these in Vector Database settings and click Connect. This will connect your Pinecone index.

Weaviate:
Access your Weaviate Account.
Navigate to the dashboard, Create your cluster for Weaviate, or access existing clusters.
Get your API-key, and URL from cluster details.
To create your class run the following code with your Weaviate Credentials.
Once your class is created, go to Vector Settings in SuperAGI by clicking the settings icon on the top right corner.
From Vector Database Settings, select Weaviate.
Add your vector database settings and click connect. This will connect your Weaviate class.

Introduction to Agent Summary – Improving Agent Output by Using LTS & STM

Akshat Jain — Fri, 08 Sep 2023 12:52:12 +0000

The recent introduction of the “Agent Summary” feature in SuperAGI version 0.0.10 has brought a drastic difference in agent performance – improving the quality of agent output. Agent Summary helps AI agents maintain a larger context about their goals while executing complex tasks that require longer conversations (iterations).

The Problem: Reliance on Short-Term Memory

Earlier, agents relied solely on passing short-term memory (STM) to the language model, which essentially acted as a rolling window of the most recent information based on the model’s token limit. Any context outside this window was lost.

For goals requiring longer runs, this meant agents would often deliver subpar and disjointed responses due to a lack of context about the initial goal and over-reliance on very recent short-term memory.

Introducing Long-Term Summaries

To provide agents with more persistent context, We enabled the addition of long-term summaries (LTS) of prior information to supplement short-term memory.

LTS condenses and summarizes information that has moved outside the STM window.

Together, the STM and LTS are combined into an “Agent Summary” that gets passed to the language model, providing the agent with both recent and earlier information relevant to the goal.

How does Agent Summary work?

The “_build_prompt_for_ltm_summary” function is used to generate a concise summary of the previous agent iterations.

It encapsulates the key points, highlighting the key issues, decisions made, and any actions assigned.

The function takes a list of past messages and a token limit as input.

It reads a prompt from a text file, replaces placeholders with the past messages and the character limit (which is four times the token limit), and returns the final prompt.

The “_build_prompt_for_recursive_ltm_summary_using_previous_ltm_summary” function, on the other hand, is used when there is a previous summary of interactions and additional conversations that were not included in the original summary.

This function takes a previous long-term summary, a list of past messages, and a token limit as input. It reads a prompt from a text file, replaces placeholders with the previous summary, the past messages, and the character limit, and returns the final prompt.

The “_build_prompt_for_recursive_ltm_summary_using_previous_ltm_summary” function is used over the “_build_prompt_for_ltm_summary” function when the token count of the LTM prompt, the base token limit for the LTS, and the output token limit exceeds the LLM token limit.

This ensures that the final prompt of the agent summary does not exceed the token limit of the language model, while still encapsulating the key highlights of the new iterations and integrating them into the existing summary.

Balancing Short-Term and Long-Term Memory

In the current implementation, STM is weighted at 75% and LTS at 25% in the Agent Summary context. The higher weightage for STM allows agents to focus on recent information within a specified timeframe. This enables them to process immediate data in real-time without being overwhelmed by an excessive amount of historical information.

Early results show Agent Summaries improving goal completion and reducing disjointed responses. We look forward to further testing and optimizations of this dual memory approach as we enhance SuperAGI agents.

View Agent Summary Benchmarks here

Building Autonomous Business Processes using AI Agent Workflows

Akshat Jain — Wed, 06 Sep 2023 05:41:14 +0000

Agent Workflows provide an efficient solution to one of the most pressing challenges faced by businesses today – Automating the business processes that usually require knowledge workers to put relentless hours on repetitive tasks. This can be solved by using Agent Workflows, which allow agents to autonomously execute repetitive tasks, reduce errors & scale operations. This allows knowledge workers to focus on more creative tasks and solve complex problems, which can eventually improve the overall organization’s operational efficiency.

Understanding Agent Workflows

Agent Workflows are a pre-defined set of ReAct LLM architecture steps that an AI agent can execute in a loop and autonomously iterate each step using LLM. Existing business operation playbooks can be translated into Agent workflows – allowing AI agents to execute these tasks and run on auto-pilot.

One such example of Agent workflow can be Sales Engagement Workflow. This agent workflow allows AI agents to act as an SDR/BDR for mining prospects’ data and then drafting a personalized cold reach-out message to engage with them. Following the below steps:

Filter potential prospects using apollo.io and save it as a CSV containing the prospect name, LinkedIn profile, email & company name
Read the prospect’s data from the CSV generated or manually uploaded via the resource manager
Research about each prospect’s company
Draft a highly personalized email using the outputs from the research
Sends the email to the respective email ID Repeat the process for each row in the CSV in a loop

Another example of agent workflow can be HR and Talent Acquisition Workflow – This workflow can autonomously analyse the CVs (pulled from a source or manually uploaded via resource manager), compare them with the Job description, shortlist the candidates, and draft a confirmation message to the candidates.

Similarly, More such Agent workflows can be built to autonomously execute various business processes including Marketing Operations, Customer Support, Data Analysis and many more…

Creating Your Custom Agent Workflow

One can easily create an agent workflow by making changes to the workflow_seed.py and main.py files in the SuperAGI repository. There are four key aspects to creating an agent workflow:

input_instruction: Defines what each step is intended for.
output_instruction: Defines the outcome of each step.
TASK_QUEUE: Enables looping in the workflows.
WAIT_FOR_PERMISSION: Asks for approval and feedback from the user. For every step in the workflow, the previous step’s output acts as the input for the current step.

Step-by-step Guide

Clone the SuperAGI GitHub repository and navigate to the superagi/agent/workflow_seed.py file.
Set up your workflow’s method name and define the workflow name in the parameters in the method.
Assign step numbers at both places as shown; these are the identifiers of the steps.
Use step_type="TRIGGER" to identify the first workflow step. Make sure to include only one trigger in a workflow.
Define the tool that you want to use in the workflow steps. The tool name should match what’s defined in the BaseTool class.
To use the TASK_QUEUE, you don’t need to make any changes to the tool name and input instructions. The looping works when the previous step gives an array of items and you want to run a particular flow on each item.
Use the WAIT_FOR_PERMISSION to change the input instruction as the permission question you want to see in the console.
Define all the required workflow steps via the code.
Connect the steps using the code shown below. Define what step comes after what step, and which step to go to based on the permission.

After making the changes in workflow_seed.py, navigate to main.py in the SuperAGI folder and add the created workflow’s method name.

With this, one should be able to see their agent workflow in the dropdown at the time of agent provisioning in SuperAGI.

Currently, Custom Workflows can only be created through code, but SuperAGI is exploring options to include a workflow builder in the GUI soon.

Processing Structured & Unstructured Data with SuperAGI and LlamaIndex

Akshat Jain — Fri, 21 Jul 2023 07:09:03 +0000

SuperAGI's latest integration with LlamaIndex can extend the overall agent’s capability of understanding and working with a wide range of data types and sources.

With LlamaIndex, AI agents in SuperAGI can now ingest data from:

Unstructured Data sources such as,

Documents and Raw Text Files: Like word processing documents or simple text notes (.docx, .txt)
PDFs: Digital documents
Videos and Images: Visual media formats (.jpg, .png, .mp4 etc)

as well as Structured Data sources like

Excel and CSV: Tabulated data where information is presented in rows and columns.
SQL: A database format where data is stored in tables with rows and columns.

or even, Semi-structured Data sources such as Slack & Notion.

Data Processing in SuperAGI

There are various steps involved in fetching, processing, and sending the data to the Vector Database through LlamaIndex in the form of vector node objects.

🔁 Resource Management & Data Conversion

Files and documents are uploaded to the SuperAGI Resource Manager where these files are parsed through LlamaIndex and converted into vector node objects, which are subsequently stored in a VectorDB like Redis, Chroma, Pinecone, or Qdrant.

📝 Conversion to Vector Node Objects

SuperAGI resource Manager stores the data in the form of vectorized node objects allowing fast and easy accessibility. Alongside vectorized node objects, SuperAGI also stores summaries of each file. A master summary of all files within the Resource Manager is created, which can be utilized by the Agent based on the Agent's goal and instructions.

🆎 Metadata Filtering and Database Support

Metadata filtering is primarily used to filter specific resources required for an agent run. Each agent run is associated with a unique identifier, or 'agent id', which is used as a key to filter the resources. This means that the system can identify and select only those resources that are relevant to a particular agent run, improving the accuracy of the data retrieval process. The integration supports databases that inherently support metadata filtering (Redis, Chroma, Pinecone, or Qdrant).

🔍 Running Query using QueryResourceTool

Once an agent run is initiated, SuperAGI agents can query these node objects using the 'QueryResource Tool'. It allows agents to work with a large set of data resources and provides the agent with the required information throughout any iteration to accomplish its goals.

✅ Use Cases

Financial Report Analysis: Users can interact with financial data, extract analysis from it by uploading a CSV, and request the agent to generate an analysis report.
Book Chapter Summarization: By uploading an EPUB/PDF file of an entire book and instructing the agent to summarize a chapter, a summary of that chapter will be generated and stored in the output folder.

Sources:
https://twitter.com/ishaanbhola/status/1675826595985231872
https://twitter.com/geeky_baller/status/1676916836368257024

Building Your Own Custom Tool in SuperAGI: A Step-by-Step Guide

Akshat Jain — Mon, 17 Jul 2023 09:52:38 +0000

This article will guide you step by step through how to build a custom tool & add your tool to the SuperAGI. Using an example provided in a sample repository.

Step 1: Installing SuperAGI Dependencies

Begin with installing all the dependencies, and make sure you have Python installed on your system. After you've done this, run pip install superagi-tools to get the SuperAGI BaseToolkit and BaseToolclasses.

Step 2: Creating a New Python File

Create a new Python file (for example, my_tool.py), which will define your tool class. Import all the necessary dependencies required to build your own custom tool using the following commands:

from superagi.tools.base_tool import BaseTool
from pydantic import BaseModel, Field
from typing import Type

Step 3: Defining the Input Model

Next, create a Pydantic BaseModel class. This class will define the input schema for your tool. You'll need to specify the fields and their types, as shown below:

class MyToolInput(BaseModel):
message: str = Field(..., description="Message to be processed by the tool")

Step 4: Defining Your Tool Class

Now, you can create a class that inherits from BaseTool.

You'll need to set required attributes such as name, args_schema, and description. Moreover, you need to implement the _execute method, which contains the logic for your tool. This method takes the input parameters as arguments and returns the tool's output.

Here's an example:

class MyTool(BaseTool):
name: str = "My Tool"
args_schema: Type[BaseModel] = MyToolInput
description: str = "Description of my tool"

def _execute(self, message: str = None):
# Tool logic goes here

Step 5: Creating a Toolkit File

Next, create another Python file (for instance, my_toolkit.py) to define your toolkit class. Import the necessary dependencies as follows:

from superagi.tools.base_tool import BaseToolkit, BaseTool
from typing import Type, List

Step 6: Defining Your Toolkit Class

This class should inherit from BaseToolkit and optionally from ABC if you want to make it abstract.

Set required attributes such as name and description. You should also implement the get_tools method, which returns a list of instances of your tool classes, and the get_env_keys method, which returns a list of environment variable keys required by your toolkit.

Below is an example:

class MyToolkit(BaseToolkit):
name: str = "My Toolkit"
description: str = "Description of my toolkit"

def get_tools(self) -> List[BaseTool]:
return [MyTool()]

def get_env_keys(self) -> List[str]:
return []

Step 7: Configuring the Environment

At this stage, create a configuration file (like config.yaml) to define any environment variables your toolkit or tool might require.

MY_ENV_VAR: 'YOUR_VALUE'

Step 8: Building Your Tool

Now implement the logic within your tool's _execute method based on your requirements.

You can access the environment variables using the get_tool_config method inherited from BaseTool.

Step 9: Testing Your Tool

After you've built your tool, you need to write test cases to verify its functionality. This step ensures that your tool works as expected.

Step 10: Listing Your Tool Dependencies

If your tool requires additional dependencies, list them in a requirements.txt file. This way, anyone who uses your tool can easily install the necessary dependencies.

Step 11: Creating a GitHub Repository

Create a new GitHub Repository with your Toolkit's name and upload all the files you've worked on so far.

Step 12: Linking Your GitHub Repository to SuperAGI

Start SuperAGI using docker-compose up --build. Add your GitHub repository link to SuperAGI’s front end by clicking on the “add custom tool” button at home or navigating to the toolkits section. Paste your toolkit repository and save changes.

The SuperAGI tool manager will take care of the installation of your tool along with its dependencies.

Step 13: Rebuilding SuperAGI

Re-build SuperAGI using Docker and start using your tools:

docker compose down
docker compose up --build

At this point, you should be able to configure your tool settings from the toolkit section and start using them during the agent provisioning.

This concludes our step-by-step guide to building your own custom tool in SuperAGI.

In case you face any challenges in building/adding your custom toolkit to SuperAGI, don't hesitate to join our discord to work with the community to resolve the issues.