DEV Community

Johnny Z
Johnny Z

Posted on

Azure OpenAI Error Handling in Semantic Kernel

In real-world systems, it's crucial to handle HTTP errors effectively, especially when interacting with Large Language Models (LLMs) like Azure OpenAI. Rate limit exceeded errors (tokens per minute or requests per minute) always happen at some point, resulting in 429 errors. This blog post explores different approaches to HTTP error handling with semantic kernel and Azure OpenAI.

Default

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key"); // Or DefaultAzureCredential
Enter fullscreen mode Exit fullscreen mode

The default setup for Semantic Kernel with Azure OpenAI by AddAzureOpenAIChatCompletion. This approach offers a built-in retry policy that automatically retries requests up to three times with exponential backoff. Additionally, it can detect specific HTTP headers like 'retry-after' to implement more tailored retries.

HttpClient

var factory = provider.GetRequiredService<IHttpClientFactory>();
var httpClient = factory.CreateClient("auzre:gpt-4o");

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",  // Or DefaultAzureCredential
    httpClient: httpClient);
Enter fullscreen mode Exit fullscreen mode

By configuring an HttpClient instance, you can gain more control over HTTP error handling. Semantic Kernel disables the default retry policy when HttpClient is provided. This allows you to implement custom retry logic using the Microsoft.Extensions.Http.Resilience library. With this approach, you can define the number of retry attempts, timeouts, and how to handle specific error codes like 429 (rate limit exceeded). It is strongly recommended to add retry policies to handle transient errors with HttpClient

services.AddHttpClient("auzre:gpt-4o")
    // 'standard' automatically handle transient errors inlcuding '429'
    .AddStandardResilienceHandler() 
    .Configure(options =>
        {
            // Options for attempts and time out etc
            options.Retry.MaxRetryAttempts = 5;
        });
Enter fullscreen mode Exit fullscreen mode

An important benefit of using HttpClient is that it's not limited to Azure OpenAI. This approach works with other AI connectors like OpenAI as well.

AzureOpenAIClient

var azureOpenAIClient = new AzureOpenAIClient(
    endpoint: new Uri("https://resource-name.openai.azure.com"),
    new ApiKeyCredential("api-key"), // Or DefaultAzureCredential
    new AzureOpenAIClientOptions());

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    azureOpenAIClient);
Enter fullscreen mode Exit fullscreen mode

This approach offers similar functionality to the default setup with the built-in retry policy. In addition, AzureOpenAIClient provides more flexibility from AzureOpenAIClientOptions.

var clientOptions = new AzureOpenAIClientOptions
    {
        Transport = new HttpClientPipelineTransport(httpClient),
        RetryPolicy = new ClientRetryPolicy(maxRetries: 5)
    };
Enter fullscreen mode Exit fullscreen mode

This configuration enables you to combine HTTP retry policies from HttpClient with custom pipeline policy-based retries from the Azure OpenAI SDK.

Recommendations

The default setup might not be suitable for scenarios where you frequently encounter token limit issues.
If you already have AzureOpenAIClient registered and require maximum control, this approach allows you to leverage both HTTP client policies and Azure OpenAI pipeline policy-based retries.

Please feel free to reach out on twitter @roamingcode

Top comments (0)