DEV Community

siliang4dev
siliang4dev

Posted on

2 1 1 1 1

A Quick Walkthrough of Semantic Kernel's Kusto Connector for Vector Database Integration

Introduction

The realm of large language models (LLMs) has been significantly advanced by tools like Semantic Kernel, which provides developers with a robust and flexible platform.

As Semantic Kernel is open-sourced, I recently got some time to read its code and found it is using connectors to seamlessly integrate with external services such as OpenAI, Azure AI Search, and various vector databases. Among these connectors, it also offers one connector for Kusto db.

In this post, I'll explore this Kusto connector.

As Semantic Kernel is rapidly evolving, the code examples here, based on its latest 1.0.0 release, might be subject to future changes.

Understanding Kusto Databases

Kusto DB excels in managing and querying large data streams, a capability crucial for handling extensive datasets common in logs and telemetry. If you're new to Kusto DB, a good starting point is the Kusto Query Language Documentation.

Setting Up the Environment

Prerequisites

For this walkthrough, I used a local Jupyter Notebook with a .NET interactive kernel. To replicate this setup, you'll need the following packages:

#r "nuget: Microsoft.SemanticKernel, 1.0.1"
#r "nuget: Microsoft.KernelMemory.Core, 0.25.240103.1"
#r "nuget: Microsoft.SemanticKernel.Connectors.Kusto, 1.0.1-alpha"
using Microsoft.SemanticKernel;
using Microsoft.KernelMemory;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.Kusto;

#r "nuget: Azure.Core, 1.36.0"
#r "nuget: Microsoft.Azure.Kusto.Data, 12.0.0"
using Kusto.Data;
Enter fullscreen mode Exit fullscreen mode

These packages are crucial for integrating Semantic Kernel with Kusto DB, each serving a specific role in the setup.

Establishing the Connection

The first step is connecting to the Kusto DB using a connection string:

var connectionString = new KustoConnectionStringBuilder("<connection string>")
        .WithAadUserPromptAuthentication();
Enter fullscreen mode Exit fullscreen mode

Embedding Generator Setup

Next, we set up the embedding generator, essential for creating text embeddings:

var embeddingGenerator = new AzureOpenAITextEmbeddingGenerationService( 
    "<deployment>",
    "<endpoint>",
    "<api key>"
);
Enter fullscreen mode Exit fullscreen mode

Configuring Text Memory

To finalize the setup, configure the text memory using the connection string and embedding generator:

#pragma warning disable
KustoMemoryStore memoryStore = new(connectionString, "<kusto db name>");
SemanticTextMemory textMemory = new(memoryStore, embeddingGenerator);
#pragma warning restore
Enter fullscreen mode Exit fullscreen mode

Storing Information

With everything set up, it's time to store some data:

textMemory.SaveInformationAsync("meDef", id: "doc1", text: "My name is Andrea.");
textMemory.SaveInformationAsync("meDef", id: "doc2", text: "I am 30 years old.");
textMemory.SaveInformationAsync("meDef", id: "doc3", text: "I live in South America.");
Enter fullscreen mode Exit fullscreen mode

This step involves creating a new table in the specified Kusto database, if it doesn't already exist. Every SaveInformationAync run will insert a new record with columns of "Key", "Metadata", "Timestamp" and "Embedding" created by the embedding generator.
kql screenshot

Retrieving Answers

To perform a similarity search and get relevant answers, run:

var answer = textMemory.SearchAsync("meDef", "What's my name?");

await foreach (var answer in textMemory.SearchAsync(
            collection: "meDef",
            query: "What's my name?",
            limit: 2,
            minRelevanceScore: 0.79,
            withEmbeddings: true))
        {
            Console.WriteLine($"Answer: {answer.Metadata.Text}");
        }
Enter fullscreen mode Exit fullscreen mode

answer

The SearchAsync function internally generates a KQL query that uses functions like series_cosine_similarity_fl to rank records based on similarity.

Conclusion and Thoughts

  1. Semantic Kernel is rapidly evolving, and its usage of connectors like Kusto DB might change.
  2. Kusto DB, while powerful for stream data processing, doesn't specialize in similarity search indexing, potentially slowing down retrieval times.
  3. As Kusto DB isn't inherently a semantic search tool and lacks chunking functionality, additional tools might be necessary for storing semantic memory effectively.

All comments and insights are welcomed.

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

Top comments (0)

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay