siliang4dev

Posted on Jan 17, 2024

A Quick Walkthrough of Semantic Kernel's Kusto Connector for Vector Database Integration

#vectordatabase #kusto #semantickernel

Introduction

The realm of large language models (LLMs) has been significantly advanced by tools like Semantic Kernel, which provides developers with a robust and flexible platform.

As Semantic Kernel is open-sourced, I recently got some time to read its code and found it is using connectors to seamlessly integrate with external services such as OpenAI, Azure AI Search, and various vector databases. Among these connectors, it also offers one connector for Kusto db.

In this post, I'll explore this Kusto connector.

As Semantic Kernel is rapidly evolving, the code examples here, based on its latest 1.0.0 release, might be subject to future changes.

Understanding Kusto Databases

Kusto DB excels in managing and querying large data streams, a capability crucial for handling extensive datasets common in logs and telemetry. If you're new to Kusto DB, a good starting point is the Kusto Query Language Documentation.

Setting Up the Environment

Prerequisites

For this walkthrough, I used a local Jupyter Notebook with a .NET interactive kernel. To replicate this setup, you'll need the following packages:

#r "nuget: Microsoft.SemanticKernel, 1.0.1"
#r "nuget: Microsoft.KernelMemory.Core, 0.25.240103.1"
#r "nuget: Microsoft.SemanticKernel.Connectors.Kusto, 1.0.1-alpha"
using Microsoft.SemanticKernel;
using Microsoft.KernelMemory;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.Kusto;

#r "nuget: Azure.Core, 1.36.0"
#r "nuget: Microsoft.Azure.Kusto.Data, 12.0.0"
using Kusto.Data;

These packages are crucial for integrating Semantic Kernel with Kusto DB, each serving a specific role in the setup.

Establishing the Connection

The first step is connecting to the Kusto DB using a connection string:

var connectionString = new KustoConnectionStringBuilder("<connection string>")
        .WithAadUserPromptAuthentication();

Embedding Generator Setup

Next, we set up the embedding generator, essential for creating text embeddings:

var embeddingGenerator = new AzureOpenAITextEmbeddingGenerationService( 
    "<deployment>",
    "<endpoint>",
    "<api key>"
);

Configuring Text Memory

To finalize the setup, configure the text memory using the connection string and embedding generator:

#pragma warning disable
KustoMemoryStore memoryStore = new(connectionString, "<kusto db name>");
SemanticTextMemory textMemory = new(memoryStore, embeddingGenerator);
#pragma warning restore

Storing Information

With everything set up, it's time to store some data:

textMemory.SaveInformationAsync("meDef", id: "doc1", text: "My name is Andrea.");
textMemory.SaveInformationAsync("meDef", id: "doc2", text: "I am 30 years old.");
textMemory.SaveInformationAsync("meDef", id: "doc3", text: "I live in South America.");

This step involves creating a new table in the specified Kusto database, if it doesn't already exist. Every SaveInformationAync run will insert a new record with columns of "Key", "Metadata", "Timestamp" and "Embedding" created by the embedding generator.

Retrieving Answers

To perform a similarity search and get relevant answers, run:

var answer = textMemory.SearchAsync("meDef", "What's my name?");

await foreach (var answer in textMemory.SearchAsync(
            collection: "meDef",
            query: "What's my name?",
            limit: 2,
            minRelevanceScore: 0.79,
            withEmbeddings: true))
        {
            Console.WriteLine($"Answer: {answer.Metadata.Text}");
        }

The SearchAsync function internally generates a KQL query that uses functions like series_cosine_similarity_fl to rank records based on similarity.

Conclusion and Thoughts

Semantic Kernel is rapidly evolving, and its usage of connectors like Kusto DB might change.
Kusto DB, while powerful for stream data processing, doesn't specialize in similarity search indexing, potentially slowing down retrieval times.
As Kusto DB isn't inherently a semantic search tool and lacks chunking functionality, additional tools might be necessary for storing semantic memory effectively.

All comments and insights are welcomed.

DEV Community