Creating a Zelda Chat Assistant using Semantic Kernel

#dotnet #ai #csharp #rag

"The Legend of Zelda: Breath of the Wild" was one of the first games I bought after getting a Nintendo Switch. Let me just say that although I'd give it 5 stars, the open-endedness of it gave me a bit of anxiety. It's hard enough figuring out my own life, and now I also have to do it in Zelda?! Regardless, it's still a favorite in my house, and both of my kids love it. Yes, they also love "Tears of the Kingdom", and no, I haven't started it yet :(

I'm teaching them a bit of coding this summer, and what better way to give them a little inspiration and test out Semantic Kernel, Microsoft's new library for building AI-enabled applications, than by building a "Breath of the Wild" assistant?

The Technology

This project is built using the Retrieval-Augmented Generation (RAG) architecture. I chose to host an embedding model (mxbai-embed-large) and a chat completion model (PHI3) locally using Ollama. For data, I'm using the Hyrule Compendium API.

The solution consists of a trainer application for generating embeddings on the Zelda knowledge base and storing the facts in a database. The chat assistant performs semantic search using the embeddings from the user's question. It then provides the question and the relevant set of facts to the chat completion model to produce an answer.

Extending Semantic Kernel to Support Ollama Embeddings

To generate embeddings with Ollama using Semantic Kernel, I implemented a service class based on the ITextEmbeddingGenerationService interface from Semantic Kernel. This service class integrates Ollama's capabilities with Semantic Kernel, allowing for embeddings to be generated whenever new information is stored or when search queries are submitted.

Ollama support is now available as pre-release in Semantic Kernel but will leave the original solution here:

public class OllamaEmbeddingGenerationService : ITextEmbeddingGenerationService
{
    public IReadOnlyDictionary<string, object?> Attributes => new Dictionary<string, object?>();

    private HttpClient _client;

    private ILogger<OllamaEmbeddingGenerationService> _logger;

    private OllamaEmbeddingServiceOptions _options;

    public OllamaEmbeddingGenerationService(HttpClient? httpClient,
      OllamaEmbeddingServiceOptions? config, ILogger<OllamaEmbeddingGenerationService> logger) {

        _client = httpClient ?? throw new ArgumentNullException();
        _logger = logger;
        _options = config ?? throw new ArgumentNullException();
    }


    public async Task<IList<ReadOnlyMemory<float>>> GenerateEmbeddingsAsync(IList<string> data,
    Kernel? kernel = null, CancellationToken cancellationToken = default)
    {
        List<ReadOnlyMemory<float>> results = new List<ReadOnlyMemory<float>>();
        foreach(string text in data )  {
            results.Add(await GetEmbeddingsAsync(text,cancellationToken));
        }
        return results;
    }

    private async Task<ReadOnlyMemory<float>> GetEmbeddingsAsync(string text, CancellationToken token) {

        token.ThrowIfCancellationRequested();
        var response = await _client.PostAsJsonAsync(RequestUri, RequestBody(text), token );
        string json = await EnsureSuccessAndReadResultAsync(response);
        var result = DeserializeOrDefault(json);

        return new ReadOnlyMemory<float>(result?.Embeddings ?? throw new InvalidOperationException());
    }

    private async Task<string> EnsureSuccessAndReadResultAsync(HttpResponseMessage response) {
        response.EnsureSuccessStatusCode();
        string resultContent = await response.Content.ReadAsStringAsync().
            ConfigureAwait(false);
        return resultContent;
    }

    private OllamaEmbeddingResponse DeserializeOrDefault(string json) {
        OllamaEmbeddingResponse? returnValue = null;
        try {
            returnValue = JsonSerializer.Deserialize<OllamaEmbeddingResponse>(json);
        } catch(JsonException exception) {
            _logger?.LogError("Embedding request failed to return a valid json result");
            _logger?.LogError(exception.Message);
        }
        return returnValue ??
          new OllamaEmbeddingResponse {
             Embeddings = Enumerable.Empty<float>().ToArray()};
    }

    private string RequestUri => $"{Host}/api/embeddings";

    private OllamaEmbeddingRequest RequestBody(string text) => new() { Model = ModelId, Text = text};

    private string ModelId => _options.ModelId ?? throw new NullReferenceException();

    private string Host => _options.Host ?? throw new NullReferenceException();

}

Acquiring and Ingesting the Data

Data is acquired by making API calls to the Hyrule Compendium API. Semantic Kernel handles the generation of embeddings provided by Ollama. These embeddings, along with descriptions and any relevant metadata about the facts, are stored using the SaveInformationAsync method. This approach simplifies data ingestion and ensures all necessary information is properly organized and accessible for processing.

public class HyruleCompendiumEnumerator : IZeldaEnumerator
{
    private const string SERVICE_ENDPOINT =
      "https://botw-compendium.herokuapp.com/api/v3/compendium/all";
    private HttpClient _http;

    public HyruleCompendiumEnumerator(HttpClient http) {
        _http = http ?? throw new ArgumentNullException();
    }

    public async Task<IEnumerable<ZeldaItem>> EnumerateAsync(
      Func<ZeldaItem, bool>? filter = null)
    {
        string jsonListing = await GetListingAsStringAsync();
        ZeldaListing zeldaListing = DeserializeOrDefault(jsonListing);
        return FilteredListing(zeldaListing, filter ?? (item => true));
    }

    private IEnumerable<ZeldaItem> FilteredListing(ZeldaListing listing,
      Func<ZeldaItem,bool> filter) => listing.Items?.
        Where(filter) ?? Enumerable.Empty<ZeldaItem>();

    private async Task<string> GetListingAsStringAsync()  {
      var responseString = await _http.GetStringAsync(SERVICE_ENDPOINT);
      return responseString ?? string.Empty;

    }

    private ZeldaListing DeserializeOrDefault(string json) {
      ZeldaListing? returnValue = null;
      try {
          returnValue = JsonSerializer.Deserialize<ZeldaListing>(json);
      }catch(JsonException exception) {
        Console.Out.WriteLine("Request for resource failed to produce valid data");
        Console.Out.WriteLine(exception.Message);
      }

      return returnValue ?? new ZeldaListing() { Items = new() };
    }
}

public class AsyncZeldaIngester : IZeldaIngester
{
    private IZeldaEnumerator _enumerator;
    private ISemanticTextMemory _memory;

    private string _collection;

    public AsyncZeldaIngester(IZeldaEnumerator zeldaEnumerator, ISemanticTextMemory memory,
        IConfiguration configuration) {
        _enumerator = zeldaEnumerator ?? throw new ArgumentNullException();
        _memory = memory ?? throw new ArgumentNullException();
        _collection = configuration["collection"] ?? throw new ArgumentNullException();
    }

    public async Task ExecuteAsync()
    {
        foreach(var fact in await _enumerator.EnumerateAsFactsAsync()) {
            await _memory.SaveInformationAsync(_collection, fact.Text ?? string.Empty,
              fact.Id, fact.Description);
        }
    }
}

Using Memories to Assist the Chat Assistant

To give the chat assistant access to relevant information, the Zelda facts created by the Trainer are retrieved from the memory collection and then given to the chat response generation process.

private async Task<IEnumerable<string>> SearchForFactsAsync(string? question) {
       var results = _memory.SearchAsync(
                collection, question ?? string.Empty,
                limit:size,
                minRelevanceScore:relevance,
                cancellationToken:_shutdownToken.Token);

       List<string> values = new();
        await foreach(var result in results.WithCancellation(_shutdownToken.Token)) {
            values.Add(result.Metadata.Text);
        }

        return values;
   }

private async Task QueryChatAssistantAsync(string question,
      IEnumerable<string> facts)  {

      var corpus = string.Join(" ", facts);
      var ai = _kernel.GetRequiredService<IChatCompletionService>();
      ChatHistory chat = new($"{prompt}{corpus}");

      chat.AddUserMessage(question);
      await foreach(var message in ai.GetStreamingChatMessageContentsAsync(chat,kernel: _kernel) )
        {
            Console.Write(message);
        }
        await Console.Out.WriteLineAsync(string.Empty);
}

Chatting with the Assistant

Here are some interesting or notable question and answers:

Starting with something easy. The compendium lists items that monsters drop. This response is using the fact directly.
Locations are simple facts that the model can access. This mixes it up a little by asking where you can fight rather than find Fireblight Ganon.
Going a little further to see if it can understand bows that fire multiple arrows.
Seeing if it can synthesize from the facts.
Seeing if it can offer an informed opinion about a rusted weapon.
Can it measure and have an opinion about concepts like strength and durability?

Overall, I was surprised at how well the chat assistant was able to answer most of the questions I threw at it. When there were no facts in the memory to support an answer, it simply stated that it couldn't provide an answer. I'm pretty impressed with how well the mxbai embeddings work and will certainly consider using that model again. I really enjoyed its response to the question about the strongest shield, which leads to a nice feature of Semantic Kernel that would be fun to implement as a follow-up: Plugins.

If we want the chat assistant to answer comparative questions like "Which shield is more durable, A or B?", Plugins allow you to write code that computes the answer. The functions are annotated and made available to Semantic Kernel, enabling the chat assistant to use computed facts in its responses. If I ever get around to it, I'll post a follow-up.