DEV Community

Cover image for Beyond Functions: Seamlessly build AI enhanced APIs with OpenAI
Stefan  πŸš€
Stefan πŸš€

Posted on • Originally published at wundergraph.com

Beyond Functions: Seamlessly build AI enhanced APIs with OpenAI

Today we're announcing the WunderGraph OpenAI integration / Agent SDK to simplifly the creation of AI enhanced APIs and AI Agents for Systems Integration on Autopilot.
On a high level this integration enables two things:

  1. Build AI enhanced APIs with OpenAI that return structured data (JSON) instead of plain text

  2. Build AI Agents that can perform complex tasks leveraging your existing REST, GraphQL and SOAP APIs, as well as your databases and other systems

Examples

Before we dive deep into the problem and technical details, let's have a look at two examples.

Example 1: AI Agent creation with OpenAI

Here's a simple example that shows how we can use OpenAI to create an Agent that can call multiple APIs and return structured data (JSON) conforming to our defined API schema.

// .wundergraph/operations/openai/GetWeatherByCountry.ts
export default createOperation.query({
    input: z.object({
        country: z.string(),
    }),
    description: 'This operation returns the weather of the capital of the given country',
    handler: async ({ input, openAI, log }) => {

        // we cannot trust the user input, so we've got a helper function
        // that parses the user input and validates it against a schema
        const parsed = await openAI.parseUserInput({
            userInput: input.country,
            // we can use zod to define the schema
            // if OpenAI cannot parse the user input,
            // or zod validation fails, an error is thrown
            schema: z.object({
                country: z.string().nonempty(),
            }),
        });

        // it's optional to use the parseUserInput helper function
        // but it's recommended if you cannot trust the user input
        // e.g. the user could have entered "Germany" or "DE",
        // or just another prompt that is not a country at all and would confuse OpenAI

        // next we create an agent to perform the actual task
        const agent = openAI.createAgent({
            // functions takes an array of functions that the agent can use
            // these are our existing WunderGraph Operations that we've previously defined
            // A WunderGraph Operation can interact with your APIs and databases
            // You can use GraphQL and TypeScript to define Operations
            // Typescript Operations (like this one right here) can host Agents
            // So you can also call other Agents from within an Agent
            functions: [{ name: 'CountryByCode' }, { name: 'weather/GetCityByName' }],
            // We want to get structured data (JSON) back from the Agent
            // so we define the output schema using zod again
            structuredOutputSchema: z.object({
                city: z.string(),
                country: z.string(),
                temperature: z.number(),
            }),
        });
        // Finally, we execute the agent with a prompt
        // The Agent will automatically fetch country data from the CountryByCode Operation
        // and the weather data from the weather/GetCityByName Operation
        // It will then generate a response using the schema we've defined
        return agent.execWithPrompt({
            prompt: `What's the weather like in the capital of ${parsed.country}?`,
        });
    },
});
Enter fullscreen mode Exit fullscreen mode

Example 2: OpenAI enhanced API

How about extracting meta data from a website and exposing the functionality as a JSON API?
Sounds simple enough, right?

// .wundergraph/operations/openai/GetWebsiteInfo.ts
export default createOperation.query({
    input: z.object({
        url: z.string(),
    }),
    description: 'This operation returns the title, description, h1 and a summary of the given website',
    handler: async ({ input, openAI, log }) => {
            const agent = openAI.createAgent({
                model: 'gpt-3.5-turbo-16k-0613',
                functions: [
                    {
                        name: 'web/load_url',
                        // we're using the web/load_url function to load the content (HTML) of a website
                        // our model is only capable of processing 16k tokens at once
                        // so we need to paginate the content and process it in chunks
                        // the Agent SDK will automatically split the content and merge the responses
                        pagination: {
                            // we set the page size to 15kb, you can play around with this value
                            pageSize: 1024 * 15,
                            // we also set a max page limit to prevent excessive usage
                            maxPages: 3,
                        },
                    },
                    {
                        // we can use nother Operation to summarize the content
                        // as the path suggests, it's using an Agent as well under the hood
                        // meaning that we're composing Agents here
                        name: 'openai/summarize_url_content',
                    },
                ],
                // we define the output schema using zod again
                // without this, our API would return plain text
                // which would make it hard to consume for other systems
                structuredOutputSchema: z.object({
                    title: z.string(),
                    description: z.string(),
                    h1: z.string(),
                    summary: z.string(),
                }),
            });
            // we execute the agent with a prompt
            return agent.execWithPrompt({
                prompt: `Load the content of the URL: ${url}
                You're a HTML parser. Your job is to extract the title, description and h1 from the HTML.
                Do not include the HTML tags in the result.
                Don't change the content, just extract the information.

                Once this is done, add a summary of the website.
                `,
            });
    },
});
Enter fullscreen mode Exit fullscreen mode

The second example is a bit more complex, but it shows how you can describe more complex tasks with a prompt and have the AI Agent execute it for you.

Additionally, we're passing an Operation as a function to the Agent, which is another Agent under the hood,
meaning that this API is actually composed of multiple Agents.

With these two examples, you should get a good idea of what's possible with the WunderGraph OpenAI integration.

Let's now rewind a bit and talk about the problems we're trying to solve here.

The Problem: Building AI enhanced APIs and Agents is challenging

When trying to build AI enhanced APIs and Agents, you'll quickly realize that there are a couple of challenges that you need to overcome.

Let's quickly define what we mean by AI enhanced APIs and Agents and then talk about the challenges.

What are AI enhanced APIs?

An AI enhanced API is an API that accepts an input in a predefined format and returns structured data (e.g. JSON),
allowing it to be described using a schema (e.g. OpenAPI, GraphQL, etc.).

Tools like ChatGPT are fun to play with, but they're not very usefuly when you want to build APIs that can be consumed by other systems.

So, the bare minimum for an AI enhanced API is that we can describe it using a schema, in our case we're using JSON Schema which plays nicely with OpenAPI and OpenAI as you'll see later.

What are AI Agents?

An AI Agent is a dialog between a large language model (e.g. GPT-3) and a computer program (e.g. a WunderGraph Operation) that is capable of performing a task.

The dialog is initiated by a prompt (e.g. a question or a task description).

We can provide additional functionality to the Agent by passing functions to it which we have to describe using a schema as well.

Once the dialog is initiated, the Agent can come back to us, asking to execute one of the functions we've provided.
It will provide the input to call the function, which will follow the schema we've defined.

We execute the function and add the result to the dialog and the Agent will continue performing the task until it's done.

Once the Agent is done, it will return the result to us, ideally in a format that we can describe using a schema.

Challenges

1. LLMs don't usually return structured data, but plain text

If you've used ChatGPT before, you'll know that it's fun to play with if a powerful enough "Agent" sits in front of it, like a human (you).

But what if you want to build an API that can be consumed by other systems?

How are services supposed to consume plain text without any structure?

2. Prompt Injection: We cannot trust user input

When building an API, we usually have to deal with user input.

We can ask the user to provide a country name as the input to our API, but what if the user provides a prompt instead of a country name that is designed to trick the AI?
This is called prompt injection and it's a real problem when building AI enhanced APIs.

3. Pagination & Batching: LLMs can only process a limited amount of tokens at once

LLMs are powerful, but they're not infinitely powerful.
They can only process a limited amount of tokens at once.
This means that we have to paginate the input, process it in chunks, and then merge the results back together,
all in a structured way so that we can parse the result later.

4. Composing Agents: We need to be able to compose Agents

You will usually start building lower level Agents that perform a specific task, like loading the content of a website or summarizing the content of a website.

Once you have these Agents, you want to be able to compose them to build more powerful higher-level Agents.
How can we make it easy to compose AI Agents?

5. LLMs like OpenAI cannot call external APIs and Databases directly

OpenAI is allows you to describe functions that can be called by the Agent.

The challenge is that you have to describe the functions using plain JSON Schema.

This means that you cannot directly call REST, GraphQL or SOAP APIs, or even databases.

You have to describe the function using JSON Schema and then implement a mechanism that calls APIs and databases on behalf of the Agent.

LLMs can generate GraphQL Operations or even SQL statements, but keep in mind that these need to be validated and sanitized before they can be executed.

In addition, requiring an LLM to manually generate GraphQL Operations, REST API calls or SQL statements comes with another problem:
You have to describe the GraphQL Schema, REST API or the database schema, and all of these input will count towards the token limit of the LLM.

This means that if you provide a GraphQL Schema with 16k tokens to a 16k-limited LLM, there's no space left for the actual prompt.

Wouldn't it be nice if we could describe just a few "Operations" that are useful to a specific Agent?

Yes, absolutely! But then there's another problem:
How can we describe Operations in a unified way that is compatible with OpenAI but works across different APIs like REST, SOAP, GraphQL and databases?

The Solution: The WunderGraph OpenAI Integration / Agent SDK

Let's now talk about the solution to these problems using the WunderGraph OpenAI integration.

If you're not yet familiar with WunderGraph, it's an Open Source API Integration / BFF (Backend for Frontend) / Programmable API Gateway toolkit.

At the core of WunderGraph is the concept of "API Dependency Management / API Composition".

WunderGraph allows you to describe a set of heterogeneous APIs (REST, GraphQL, SOAP, Databases, etc.) using a single schema.
From this description, WunderGraph will generate a unified API that you can define "Operations" for.

Operations are the core building blocks of exposing functionality on top of your APIs.
An Operation is essentially a function that can be called by a client.

Both the input and the output of an Operation are describe using JSON Schema.

All Operations exposed by a WunderGraph Application are described using an OpenAPI Specification (OAS) document or a Postman Collection, so it's easy to consume them from any programming language.

Having the "Operations" abstraction on top of your API Dependency Graph allowed us to keep the Agent as simple as it is.

All you need to do is add your API dependencies,
define a couple of Operations that are useful to your Agent, and pass them along with a prompt to the Agent.

It doesn't matter if you're using REST, GraphQL, SOAP, a Database or just another TypeScript function as an Operation, they all look the same to the Agent, they all follow the same semantics.

Let's now talk about the challenges we've mentioned earlier and how the WunderGraph OpenAI integration solves them.

How the WunderGraph Agent SDK helps you to return structured data from OpenAI

By default, OpenAI will return plain text.
So, when OpenAI is done processing our prompt, we'll get back a string of text.

How can we turn this into structured data?

Let's recall the Agent definition from earlier:

const agent = openAI.createAgent({
    functions: [{ name: 'CountryByCode' }, { name: 'weather/GetCityByName' }],
    structuredOutputSchema: z.object({
        city: z.string(),
        country: z.string(),
        temperature: z.number(),
    }),
});
const out = await agent.execWithPrompt({
    prompt: `What's the weather like in ${country}?`, // e.g. Germany
});
console.log(out.structuredOutput.city); // Berlin
Enter fullscreen mode Exit fullscreen mode

We pass two functions to the Agent and define a schema that describes the output we expect from the Agent using the zod library.

Internally, we will compile the schema to JSON Schema.
Once the Agent is done, we'll create a new "dialog" asking the Agent to call our "out" function and pass the result to it.

To describe the input we're expecting to receive from the Agent, we'll use the generated JSON Schema.
This will prompot the Agent to call our "out" function and pass the result to it in a structured way that we can parse.

We can the use the zod library to parse the result and raise an error if the result doesn't match the schema we've defined.

As WunderGraph Operations are using TypeScript, we can infer the TypeScript types from the zod schema description,
which means that the result of "out" will be typed automatically.

More importantly, we're also using the TypeScript compiler to infer the response type of Operations in general.
So if you're returning out.structuredOutput from an Operation, another Operation can call our Operation in a type-safe way, or even use our Operation as a function for another Agent.

How the WunderGraph Agent SDK helps you to prevent prompt injection

Let's recall another example from earlier:

export default createOperation.query({
    input: z.object({
        country: z.string(),
    }),
    description: 'This operation returns the weather of the capital of the given country',
    handler: async ({ input, openAI, log }) => {
        const parsed = await openAI.parseUserInput({
            userInput: input.country,
            schema: z.object({
                country: z.string().nonempty(),
            }),
        });
        // Agent code goes here
    },
});
Enter fullscreen mode Exit fullscreen mode

If we would pass the user input directly to our Agent,
we would be vulnerable to prompt injection.

This means that a malicious user could pass a prompt that would cause the Agent to execute arbitrary code.

To prevent this, we're first running the user input through the openAI.parseUserInput function.

This function parses the input into our desired schema and validates it.

Furthermore, it will check for prompt injection attacks and throws an error if it detects one.

How the WunderGraph Agent SDK helps you to process large amounts of data

Let's say you'd like to summarize the content of a website.
Websites can be of arbitrary length, so we cannot just pass the content of the website to the Agent because LLMs like GTP have a token limit.

Instead, what we can do is to split the content into pages, process each page individually and then combine the results.

Here's an abbreviated example of how you can apply pagination to your Agent:

const agent = openAI.createAgent({
    model: 'gpt-3.5-turbo-16k-0613',
    functions: [
        {
            name: 'web/load_url',
            // we're using the web/load_url function to load the content (HTML) of a website
            // our model is only capable of processing 16k tokens at once
            // so we need to paginate the content and process it in chunks
            // the Agent SDK will automatically split the content and merge the responses
            pagination: {
                // we set the page size to 15kb, you can play around with this value
                pageSize: 1024 * 15,
                // we also set a max page limit to prevent excessive usage
                maxPages: 3,
            },
        },
    ],
});
Enter fullscreen mode Exit fullscreen mode

In this case, we're dividing the website content into 3 pages, each page is 15kb in size.

The Agent will process each page individually and then combine the results.

How the WunderGraph Agent SDK helps you to compose multiple Agents

If you recall the second example, we were passing a function named openai/summarize_url_content to our Agent.
This Operation contains the logic to summarize the content of a website, using an Agent by itself.

In the prompt to our metadata extraction Agent, we ask it to summarize the content of the website,
so our Agent will use the openai/summarize_url_content function to do so.

As you can wrap Agents in an Operation, you can easily compose multiple Agents together.

The recommended way to do so is to start creating low-level Agents that are capable of doing a single thing.

You can then compose these low-level Agents into higher-level Agents that perform two or more tasks,
and so on.

How the WunderGraph Agent SDK helps you to integrate OpenAI with your existing APIs like REST, GraphQL, SOAP or Databases

As explained earlier, WunderGraph Operations are an abstraction on top of your API Dependency Graph,
allowing you to integrate any API into an AI Agent.

You can provide Operations in two way to the Agent, either by using a GraphQL Operation against your API Graph,
or by creating a custom TypeScript Operation, which might contain custom business logic, call other APIs or even other Agents.

Most importantly, we need a way to describe the input and functionality of an Operation to the LLM Agent.

All of this is abstracted away by the WunderGraph Agent SDK and works out of the box.

All you need to do is add a description to your Operation and the Agent SDK will take care of the rest.

Here's an example using a GraphQL Operation:

# .wundergraph/operations/CountryByCode.graphql

# Loads country information by code, the code needs to be in capital letters, e.g. DE for Germany
query ($code: ID!) {
    countries_country(code: $code) {
        code
        name
        currencies
        capital
    }
}
Enter fullscreen mode Exit fullscreen mode

The Agent SDK will automatically parse the GraphQL Operation and generate a JSON Schema for the input including the description.

Here's an example using a custom TypeScript Operation:

// .wundergraph/operations/openai/summarize_url_content.ts
import { createOperation, z } from '../../generated/wundergraph.factory';

export default createOperation.query({
    input: z.object({
        url: z.string(),
    }),
    response: z.object({
        summary: z.string(),
    }),
    description: 'Summarize the content of a URL',
    handler: async ({ operations, input, log, openAI }) => {
        // agent code goes here
    },
});
Enter fullscreen mode Exit fullscreen mode

Again, the Agent SDK will parse the TypeScript Operation as well and generate a JSON Schema from the zod schema, adding the description (Summarize the content of a URL) so that the LLM Agent understands what the Operation is doing.

Getting started with the WunderGraph Agent SDK

If you need more info on how to get started with WunderGraph and OpenAI, check out the OpenAI Integration Docs.

ps: make sure you're not leaking your API key in your GitHub repo!

Conclusion

In this article, we've learned how to use the WunderGraph Agent SDK to create AI Agents that can be used to integrate any API into your AI Agent.
We've tackled some of the most common problems when building AI Agents, like prompt injection, pagination, and Agent composition.

If you like the work we're doing and want to support us, give us a star on GitHub.

I'd love to hear your thoughts on this topic, so feel free to reach out to me on Twitter
or join our Discord server to chat about it.

Top comments (0)