daniel-octomind

Posted on Oct 11, 2023 • Originally published at octomind.dev

On the unpredictable nature of LLM output and type safety in LangChain TS

#llm #typescript #ai

At Octomind, we are using Large Language Models (LLMs) to interact with web app UIs and extract test case steps that we want to generate. We use the LangChain library to build interaction chains with LLMs. The LLM receives a task prompt, and we as developers provide tools the model can utilize to solve the task.

The unpredictable and non-deterministic nature of the LLM output makes ensuring type safety quite a challenge. LangChain's approach to parsing input and handling errors often leads to unexpected and inconsistent outcomes within the type system. I’d like to share what I learned about parsing and error handling of LangChain.

I will explain:

Why did we go for TypeScript in the first place?
The issue with LLM output
How a type error can go unnoticed
What consequences this can have

*** all code examples are using LangChain TS on the main branch on September 22nd, 2023 (roughly version 0.0.153).

Why LangChain TS instead of Python?

There are two languages supported by LangChain - Python and JS/TypeScript.

There were some pros and some cons with TypeScript:

On the con side - we have to live with the fact that the TypeScript implementation is somewhat lagging behind the Python version - in code and even more so in documentation. This is a solvable issue, if you are willing to trade the documentation for just going through the source code.
On the pro side - we don't have to write another service in a different language since we are using TypeScript elsewhere, and we allegedly get guaranteed type safety, of which we are big fans here.

We decided to go for the TypeScript version of LangChain to implement parts of our AI-based test discoveries.

Full disclosure, I didn’t look into how the Python version handles the issues described below. Have you found similar issues in the python version? Feel free to share them directly in the GitHub issue I created, find the link at the end of the article.

The issue with types in LLMs

In LangChain, you can provide a set of tools that may be called by the model if it deems it necessary. For our purposes, a tool is simply a class with a _call function that does something that the model can not do on its own, like click on a button on a web page. The arguments to that function are provided by the model.

When your tool implementation depends on the developer knowing the input format (in contrast to just doing something with text generated by the model), LangChain provides a class called StructuredTool.

The StructuredTool adds a zod schema to the tool, which is used to parse whatever the model decides to call the tool with, so that we can use this knowledge in our code.

Let's build our "click" example under the assumption that we want the model to give us a query selector to click on:

Now when you look at this class, it seems reasonably simple without a lot of potential for things to go wrong. But how does the model actually know what schema to supply? It has no intrinsic functionality for this, it just generates a string response to a prompt.

When LangChain informs the model about the tools at its disposal, it will generate format instructions for each tool. These instructions define what JSON is, and what the specific input schema the model should generate to use a tool.

For this, LangChain will generate an addition to your own prompt that looks something like this:

You have access to the following tools.
You must format your inputs to these tools to match their "JSON schema" definitions below.

"JSON Schema" is a declarative language that allows you to annotate and validate JSON documents.

For example, the example "JSON Schema" instance {"properties": {"foo": {"description": "a list of test words", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
would match an object with one required property, "foo". The "type" property specifies "foo" must be an "array", and the "description" property semantically describes it as "a list of test words". The items within "foo" must be strings.
Thus, the object {"foo": ["bar", "baz"]} is a well-formatted instance of this example "JSON Schema". The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here are the JSON Schema instances for the tools you have access to:
click: left click on an element on a web page represented by a query selector, args: {"selector":{"type":"string","description":"The query selector to click on."}}

Don't trust the LLM

Now we have a best-effort way to make the model call our tool with inputs in the correct schema. Best effort unfortunately does not guarantee anything. It is entirely possible, that the model generates input that does not adhere to the schema.

So let's take a look at the implementation of StructuredTool to see how it deals with that issue. StructuredTool.call is the function that eventually calls our _call method from above.

It starts like this:

The signature of arg is interpreted as follows:

If after parsing the tool’s schema, the output can be just a string, this can also be a string, or whatever object the schema defines as input. This is the case if you define your schema as schema = z.string().

In our case, our schema can not be parsed to a string, so this simplifies to the type { selector: string }, or ClickSchema.

But is this actually the case?
According to the implementation, we only check that the input actually adheres to the schema inside of call. The signature reads like we have already made some assumptions about the input.

So one might replace the signature with something like:

But looking at it further, even this has issues. The only thing we know for certain is that the model will give us a string. This means there are two options:

1. call really should have the following signature:

2. There is another element to this:

Something must have already decided that the string returned by the model is valid JSON and have parsed it.
In case that z.output<T> extends string, something somewhere must have already decided that string is an acceptable input format for the tool, and we do not need to parse JSON. (A string by itself is not valid JSON, JSON.parse("foo") will result in a SyntaxError).

Introducing the OutputParser class

Of course, the second option is what is happening. For this use case, LangChain provides a concept called OutputParser.

Let's take a look at the default one (StructuredChatOuputParser) and its [parse method](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/agents/structured_chat/outputParser.ts#L112) in particular.

We don't need to understand every detail, but we can see that this is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON.

So, from this we either get AgentAction or AgentFinish. We don't need to concern ourselves with AgentFinish, since it is just a special case to indicate that the interaction with the model is done.

AgentAction is defined as:

By now you might have already seen - neither AgentAction nor the StructuredChatOutputParserWithRetries is generic, and there is no way to connect the type of toolInput with our ClickSchema.

Since we don't know which tool the agent has actually selected, we can not (easily) use generics to represent the actual type, so this is expected. But worse, toolInput is typed as string, even though we just used JSON.parse to get it!

Consider the positive case where the model produced output that matches our schema, let's say the string "{\"selector\": \"myCoolButton\"}" (wrapped in all the extra fluff LangChain requires to correctly parse). Using JSON.parse, this will deserialize to an object { selector: "myCoolButton" } and not a string.

But because JSON.parse's return type is any, the typescript compiler has no chance of realizing this. Unfortunately for us, this also means that we, as developers, have a hard time to realize this.

The impact on our production code

To understand why this is troublesome, we need to look into the execution loop where the AgentActions are used to actually invoke the tool.

This happens here in AgentExecutor._call. We don't really need to understand everything that this class does. Think of it as the wrapper that handles the interaction of the model with the tool implementations to actually call them.

The _call method is quite long, so here is a reduced version that only contains parts relevant for our problem (these methods are simplified parts of _call and not in the actual code base of LangChain).

The first thing that happens in the loop is to look for the next action to execute. This is where the parsing using the OutputParser comes in, and where its exceptions are handled.

You can see that in the case of an error, the toolInput field will always be a string (if this.handleParsingErrors is a function, the return type is also string).

But we have just seen above, that in the non-error case toolInput will be parsed JSON! This is inconsistent behavior, we never parse the output of handleParsingErrors to JSON.

Let's look at how the loop continues. The next step is to call the selected tool with the given input:

We only pass the previously computed output on to the tool in tool.call(action.toolInput)!

And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.

Let's summarize all the issues:

We parse the model's output to JSON and use that parsed result to call a tool
If the parsing succeeds, we call the tool with any valid JSON
If the parsing fails, we call the tool with a string
The tool parses the input with zod, which will only work in the error case if the schema is just a const stringSchema = z.string()‍
We have not covered this, but using const stringSchema = z.string() as the tool schema will not type check at all, since the generic argument of StructuredTool is T extends z.ZodObject<any, any, any, any>, and typeof stringSchema does not fulfil that constraint
The signature of tool.call allows this to type check, since we don't know specifically which tool we have at the moment, so string and any json is potentially valid
The actual type check for this happens at runtime inside this function
The developer implementing the tool has no idea about this. Since only StrucStep.actionturedTool._call is abstract, you will always get what the schema indicates, but StructuredTool.call will fail, even if you have supplied a function handleParsingErrors.
Whatever the tool gets called with is serialized into AgentAction.toolInput: string, which is not correctly typed
The library user has access to the AgentSteps with wrongly typed AgentActions, since it is possible to request them as a return value of the overall loop using returnIntermediateSteps=true. ‍ Whatever the developer does now is definitely not type safe!

How did we run into this problem?

At Octomind, we are using the AgentSteps to extract the test case steps that we want to generate. We noticed that the model often makes the same errors with the tool input format.

Recall our ClickSchema, which is just { selector: string }.

In our clicking example it would either generate according to the schema, or { element: string }, or just a string which is the value we want, like "myCoolButton".

So we built an auto-fixer for these common error cases. The fixer basically just checks whether it can fix the input using either of the options above. The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in StructuredTool.call.

We can not handle it using handleParsingErrors, since that receives only the error as input, and not the original input. Once you are overwriting StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case.

At this point, I was stuck having to figure out all of the above to see why I am getting wrongly typed inputs.

The solution to type safety

While these hurdles can be frustrating, they also present opportunities to take a deep dive into the library and come up with possible solutions instead of complaining.

I have opened two issues at LangChain JS/TS to discuss ideas on how to solve these problems:

Typing of tool input in AgentAction is broken for StructuredTool, input error handling and StructuredChatOutputParser #2710

RunOrVeith posted on Sep 25, 2023

This issue requires a bit of a lengthy explanation, but the overall problem is:

TLDR:

The types of StructuredTool, AgentAction, parsing error handling in AgentExecutor and StructuredChatOutputParser don't fit together, and it only typechecks kind of by accident at the moment. See also the summary at the bottom.

Explanation

I am going to explain the issue with an example. Let's assume we have a StructuredTool that can click on an element on a web page:

import { z } from "zod";
import { StructuredTool } from "langchain/tools";

const clickSchema = z.object({
  selector: z.string().describe("The query seletor to click on."),
});

const ClickSchema = z.infer<typeof clickSchema>;

class ClickTool extends StructuredTool<typeof clickSchema> {
  schema = clickSchema;
  name = "click";
  description =
    "left click on an element on a web page represented by a query selector";

  protected _call(arg: ClickSchema): Promise<string> {
    // We need to know that arg.selector is a thing here
    return this.click(arg.selector);
  }

  private click(selector: string): Promise<string> {

    return Promise.resolve(`Clicked on ${selector}`);
  }
}

When you look at the signature and implementation of StructuredTool.call, it seems like we already know what the input is, but in reality, the validation only happens inside of that function:

async call(
    arg: (z.output<T> extends string ? string : never) | z.input<T>,
    configArg?: Callbacks | RunnableConfig,
    /** @deprecated */
    tags?: string[]
  ): Promise<string> {
    let parsed;
    try {
      parsed = await this.schema.parseAsync(arg); // Only now we know that arg is z.input<T>!
    } catch (e) {
      throw new ToolInputParsingException(
        `Received tool input did not match expected schema`,
        JSON.stringify(arg)
      );
    }
    // ... more code

In our case, our schema can not be string, so this simplifies to the type { selector: string }. The signature reads like we have already made some assumptions about the input, where in reality we are only at a signature that looks like this:

call: async (arg: (z.output<T> extends string ? string : never) | Json, /*...*/) => Promise<String>;

But even that has more issues:

Something must have already decided that the string returned by the model is valid JSON and have parsed it.
In case that z.output<T> extends string, something somewhere must have already decided that string is an acceptable input format for the tool, and we do not need to parse JSON. (A string by itself is not valid JSON). This actually does not happen anywhere, so the case z.output<T> extends string can never be true. Also, z.string() is not a child of z.ZodObject, which is required by the generic in StructuredTool.

This is where the OutputParser comes in. The part that we really care about is the parse method:

This is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON. A non-json string will throw a syntax error if passed into JSON.parse.

 /**
   * Parses the given text and returns an `AgentAction` or `AgentFinish`
   * object. If an `OutputFixingParser` is provided, it is used for parsing;
   * otherwise, the base parser is used.
   * @param text The text to parse.
   * @param callbacks Optional callbacks for asynchronous operations.
   * @returns A Promise that resolves to an `AgentAction` or `AgentFinish` object.
   */
  async parse(text: string): Promise<AgentAction | AgentFinish> {
    try {
      const regex = /```(?:json)?(.*)(```)/gs;
      const actionMatch = regex.exec(text);
      if (actionMatch === null) {
        throw new OutputParserException(
          `Could not parse an action. The agent action must be within a markdown code block, and "action" must be a provided tool or "Final Answer"`
        );
      }
      const response = JSON.parse(actionMatch[1].trim());
      const { action, action_input } = response;

      if (action === "Final Answer") {
        return { returnValues: { output: action_input }, log: text };
      }
      return { tool: action, toolInput: action_input || {}, log: text };
    } catch (e) {
      throw new OutputParserException(
        `Failed to parse. Text: "${text}". Error: ${e}`
      );
    }
  }

From parsing, we get an AgentAction (we can ignore AgentFinish for now) that looks like this:

export type AgentAction = {
    tool: string;
    toolInput: string;
    log: string;
};

toolInput is typed as string, even though we just used JSON.parse to get it! Consider the positive case where the model produced output that matches our schema, let's say the string "{\"selector\": \"myCoolButton\"}". Using JSON.parse, this will deserialize to an object { selector: "myCoolButton" }, and not a string. But because JSON.parse's return type is any, the typescript compiler has no chance of realizing this.

To understand why this is troublesome, we need to look into the execution loop where the AgentActions are used to actually invoke the tool. This happens here in AgentExecutor._call. I've split the relevant parts of the method into these two smaller methods and simplied a bit to show my point:

  // originally part of AgentExectutor._call
  async safePlanNextStep(previousSteps: AgentStep[]): Promise<AgentAction> {
    try {
      // This will return either an AgentAction or AgentFinish through outputParser.parse
      output = await this.agent.plan(steps, inputs);
    } catch (e) {
      if (e instanceof OutputParserException) {
        let observation;
        const text = e.message;
        if (this.handleParsingErrors === true) {
          observation = "Invalid or incomplete response";
        } else if (typeof this.handleParsingErrors === "string") {
          observation = this.handleParsingErrors;
        } else if (typeof this.handleParsingErrors === "function") {
          observation = this.handleParsingErrors(e);
        } else {
          throw e;
        }
        output = {
          tool: "_Exception",
          toolInput: observation,
          log: text,
        } as AgentAction;
      } else {
        throw e;
      }
    }
  }

This is where the parsing using the OutputParser comes in, and where its exceptions are handled. You can see that in the case of an error, the toolInput field will always be a string (if this.handleParsingErrors is a function, the return type is also string). But we have just seen above, that in the non-error case toolInput will be parsed JSON! This is inconsistent behavior, we never parse the output of handleParsingErrors to JSON, so we are now in a state where toolInput is sometimes a string, and sometimes parsed json.

The next step is to call the selected tool with the given input:

// Also part of AgentExecutor._call in the real code
async safeCallTool(action: AgentAction): Promise<AgentStep> {
  const tool =
    action.tool === "_Exception"
      ? new ExceptionTool()
      : toolsByName[action.tool?.toLowerCase()];
  let observation;
  try {
    observation = tool
      ? await tool.call(action.toolInput)
      : `${action.tool} is not a valid tool, try another one.`;
  } catch (e) {
    if (e instanceof ToolInputParsingException) {
      if (this.handleParsingErrors === true) {
        observation = "Invalid or incomplete tool input. Please try again.";
      } else if (typeof this.handleParsingErrors === "string") {
        observation = this.handleParsingErrors;
      } else if (typeof this.handleParsingErrors === "function") {
        observation = this.handleParsingErrors(e);
      } else {
        throw e;
      }
      observation = await new ExceptionTool().call(
        observation,
        runManager?.getChild()
      );
      return { action, observation: observation ?? "" };
    }
  }

  return { action, observation: observation ?? "" };
}

We only pass the previously computed output on to the tool in tool.call(action.toolInput)! We do not actually have any guarantees for the input types to the tool! And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.

Summary

We parse the model's output to JSON and use that parsed result to call a tool
If the parsing succeeds, we call the tool with any valid JSON
If the parsing fails, we call the tool with a string
The tool parses the input with zod, which will only work in the error case if the schema is just a const stringSchema = z.string()
using const stringSchema = z.string() as the tool schema will not type check at all, since the generic argument of StructuredTool is T extends z.ZodObject<any, any, any, any>, and typeof stringSchema does not fulfill that constraint
The signature of tool.call allows this to type check, since we don't know specifically which tool we have at the moment, so string and any json is potentially valid
The actual type check for this happens at runtime inside this function that has a signature assuming we already have the parsed data
The developer implementing the tool has no idea about this. Since only StrucStep.actionturedTool._call is abstract, you will always get what the schema indicates, but StructuredTool.call will fail, even if you have supplied a function handleParsingErrors.
Whatever the tool gets called with is serialized into AgentAction.toolInput: string, which is not correctly typed, it is actually either string or JSON at the moment
The library user has access to the AgentStepss with wrongly typed AgentActions, since it is possible to request them as a return value of the overall loop using returnIntermediateSteps=true. Whatever the user now does with the AgentSteps is not type safe.

When is that an actual issue?

We noticed that the model often times makes the same errors with the tool input format. Recall our ClickSchema, which is just { selector: string }. In our clicking example it would either generate according to the schema, or { element: string }, or just a string which was the value we want, like "myCoolButton".

So we built and auto-fixer for these common error cases. The fixer basically just checks whether it can fix the input using either of the options above. The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in StructuredTool.call. We can not handle it using handleParsingErrors, since that receives only the error as input, and not the causing text. Once you are overwriting StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case. It would also be great if the corrected tool input could be serialized in the intermediate steps, which we can only do through some hacks at the moment, because the steps are not part of the error handling process. Separate issue for this: #2711 . At this point, you are stuck having to figure out all of the above to see why you are getting wrongly typed inputs to call and in the resulting intermediateSteps.

Improvement ideas

Unfortunately, anything that really fixes this is a breaking change. Nonetheless, this is what I would propose:

Change the signature of StructuredTool.call, so that the option of this being a string at all is gone and we only get json:
```
call: async (arg: Json, /*...*/) => Promise<String>;
```
and move the parsing of the input inside StructuredTool.call into its own method that can be overwritten in specific implementations of StructuredTool
Either change the type of AgentAction.toolInput to json (probably problematic with the non-structured agents, I have not looked into those. Could be solved by making the toolInput type generic), or keep it as string but use explicit JSON.stringify when creating an AgentAction.
The handleParsingErrors type needs to match the type of toolInput. I want to be able to provide parsed JSON as a solution to a parsing error, so either add a JSON.parse around that or change the type of the callback and string case for that argument to Json.
Extract the methods for error handling similar to above from AgentExecutor._call into their own methods for easier customization and improved readability ( see also my other issue #2711 for some improvement ideas here).

I would be willing to contribute here if we can find a good solution. I have not looked into the python code for this, but assume it has the same problem.

View on GitHub

Feature suggestion: Improve customizability of handling tool input parsing error handling #2711

RunOrVeith posted on Sep 25, 2023

This ticket is related to #2710 in the sense that I stumbled into that issue when trying to implement what I am proposing here, and some of the improvements go hand in hand.

I was trying to build an error correction for structured tool inputs that fixes common errors that the model does over and over automatically, without calling the model again. Since each tool has different such errors, it needs to happen within StructuredTool.call so that I can customize it per tool.

At the moment it looks like this:

async call(
    arg: (z.output<T> extends string ? string : never) | z.input<T>,
    configArg?: Callbacks | RunnableConfig,
    /** @deprecated */
    tags?: string[]
  ): Promise<string> {
    let parsed;
    try {
      parsed = await this.schema.parseAsync(arg);
    } catch (e) {
      throw new ToolInputParsingException(
        `Received tool input did not match expected schema`,
        JSON.stringify(arg)
      );
    }
    // ... more code

My first proposal is to extract the parsing part into its own method that can be overwritten in subclasses, and only pass that to call afterward. That would improve the typing issues in #2710.

Then comes the next issue:

Inside AgentExecutor._call, we aggregate the AgentSteps that get returned to the user using returnIntermediateSteps=true. We can not modify these steps at the moment. So if it is possible to auto-correct the input to the tool, I would like this to be reflected in the toolInput field of AgentAction. For this we have two options: Either pass the current step into the parsing method to be modified in-place, or overwrite the the field inside AgentExecutor._call with the return value of the parsing method. This would also help to solve the typing issue described in #2710.

In case we detect that we can not use the auto-fixing in a specific error case, we need to fall back to the default error handling, which is currently done inside AgentExecutor._call as well. For this, we need to be able to throw ToolInputParsingException, which can currently not be imported. It is marked as export in tools/base, but when you try to import it, it only semi-works from the dist-folder and results in an immediate crash:

import { ToolInputParsingException } from "langchain/dist/tools/base";

throw new ToolInputParsingException("foo");

Will result in

node:internal/errors:490
    ErrorCaptureStackTrace(err);
    ^

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './dist/tools/base' is not defined by "exports" in /home/veith/projects/automagically/ts/apps/next-automagically/node_modules/langchain/package.json
    at __node_internal_captureLargerStackTrace (node:internal/errors:490:5)
    at new NodeError (node:internal/errors:399:5)
    at exportsNotFound (node:internal/modules/esm/resolve:361:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:697:9)
    at resolveExports (node:internal/modules/cjs/loader:567:36)
    at Module._findPath (node:internal/modules/cjs/loader:636:31)
    at Module._resolveFilename (node:internal/modules/cjs/loader:1063:27)
    at u.default._resolveFilename (/home/veith/projects/automagically/ts/node_modules/.pnpm/@esbuild-kit+cjs-loader@2.4.2/node_modules/@esbuild-kit/cjs-loader/dist/index.js:1:1519)
    at Module._load (node:internal/modules/cjs/loader:922:27)
    at Module.require (node:internal/modules/cjs/loader:1143:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at <anonymous> (/home/veith/projects/automagically/ts/apps/next-automagically/debug/debug.ts:20:43)
    at Object.<anonymous> (/home/veith/projects/automagically/ts/apps/next-automagically/debug/debug.ts:165:1)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Object.F (/home/veith/projects/automagically/ts/node_modules/.pnpm/@esbuild-kit+cjs-loader@2.4.2/node_modules/@esbuild-kit/cjs-loader/dist/index.js:1:941)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:169:29)
    at ModuleJob.run (node:internal/modules/esm/module_job:194:25) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

So this needs to be exposed so that it can actually be used by the user. This would also be beneficial for AgentExecutor.handleParsingErrors. At the moment, the callback you can pass in for that receives the the error, so you can do something different depending on whether it is an output parser error, or a tool input error. But you can not use instanceof ToolInputParsingException as a switch, because you can't import the error type. You also can not do anything meaningful here, because you only receive the error, but not the input that caused the error.

So another option to implement these kind of custom error fixes would be to change the input arguments for handleParsingErrors to

const handleParsingErrors: (error: ToolInputParsingError | OutputParserError, causingData: Json | string, action?: AgentAction) => Json | string;

Please provide some feedback, I would be willing to contribute here. This should be solved together with #2710 to get the typing correct.