Demystifying AIContents in Microsoft.Extensions.AI

#dotnet #openai #extensions #csharp

At the core of Microsoft.Extensions.AI is an IChatClient, which serves as a unified approach to working with various OpenAI service providers.

It utilizes a list of messages (IList<ChatMessage>) to provide input for CompleteAsync().

Each message has a property called Contents, allowing each message have multiple contents sequentially, making it a multi-modal message.

IList<ChatMessage>
  1- ChatMessage
    - Role: User
    - Contents:
      - Text: "Hello, what is in my image?"
      - Image: "[Image1.jpg]"
  2- ChatMessage
    - Role: Assistant
    - Contents:
      - Text: "There is a BMW car in your image"
      - Audio: "[Voice of above sentence.]"
  3- ChatMessage
    - Role: User
    - Contents:
      - Text: "What’s its price?"
  4- ChatMessage
    - Role: Tool
    - Contents:
      - FunctionCall: "GetPrice(“BMW“)"

In this blog, I will clarify the types of content currently supported. Firstly, of these contents derive from a base class called AIContent.

The inheritance hierarchy AIContent as follows:

Different AI Contents

AIContent:

TextContent
DataContent
- ImageContent
- AudioContent
UsageContent
FunctionCallContent
FunctionResultContent

AIContent

Every content is an AIContent. It contains the shared functionalities between different types.

AdditionalProperties: Some key-value properties. -RawRepresentation*: The original raw representation of the content from an underlying implementation. It contains an of the proper with the underlying technology, whether it is an *OpenAI, AzureOpenAI, Ollama, or ...

TextContent

It's simple text content having one property: Text.

new TextContent
{
  Text = "Hello, what can you do for me?"
}

ImageContent & AudioContent

Both of these types are DataContent and are very like each other.

new ImageContent(
  uri: new Uri("https://www.example.com/image)"),
  mediaType: "image/png"
);

new AudioContent(
  uri: new Uri("https://www.example.com/voice)"),
  mediaType: "audio/wav"
);

UsageContent

This type of content includes data about the token consumption of the request.

new UsageContent(
    new UsageDetails { 
        InputTokenCount = 10,
        OutputTokenCount = 20, 
        TotalTokenCount = 30}
    );

FunctionCallContent

It indicates a function call that is requested by AI to be evaluated.

new FunctionCallContent(
    callId: "fx12",
    name: "GetFoodMenu",
    arguments: new Dictionary<string, object?>
    {
        ["mood"] =  "Happy"
    }
);

FunctionResultContent

It indicates that a function call been invoked by client, and the result is ready to be reported to the AI

new FunctionResultContent(
    callId: "fx12",
    name: "GetFoodMenu",
    result: "Pizza, Burger, Ice Cream"
);

Summary

In conclusion, the Microsoft.Extensions.AI library offers a versatile and unified approach to working with various OpenAI service providers through its IChatClient. By utilizing a list of messages (IList<ChatMessage>) with a Contents property, it supports multi-modal messages that can include text, images, audio, and function calls. This flexibility allows for more dynamic and interactive AI-driven applications.

The different types of content supported by Microsoft.Extensions.AI all inherit from the base class AIContent, which provides shared functionalities. These content types include TextContent, ImageContent, AudioContent, UsageContent, FunctionCallContent, and FunctionResultContent, each with specific properties and functionalities. The provided code examples demonstrate how these content types can be effectively used in practice, making the library a powerful tool for developers looking to integrate AI capabilities into their applications.