DEV Community: alinabi19

Running Local AI Models in .NET with Ollama (Step-by-Step Guide)

alinabi19 — Fri, 13 Mar 2026 15:59:27 +0000

Most developers who start experimenting with AI tend to follow the same path.

You integrate a cloud AI API into your application. The prototype works beautifully. Responses are fast, integration is simple, and everything feels almost magical.

Then the production questions start appearing.

How much will this cost at scale?

Do we really want sensitive data leaving our infrastructure?

What happens if the API rate limits us?

And the big one many developers eventually ask:

Can we run AI models locally instead?

The answer is yes. And tools like Ollama make it much easier than most developers expect.

Ollama allows you to run powerful language models directly on your machine and access them through a simple HTTP API. This means you can integrate local AI into ASP.NET Core APIs, background services, or internal tools without relying on external providers.

In this guide we will walk through:

why local AI models are becoming popular
how Ollama works
how to run models locally
how to call Ollama from a .NET application
how to build a simple AI-powered ASP.NET Core endpoint

If you are a .NET developer curious about integrating AI without depending entirely on cloud APIs, this is a great place to start.

Why Run AI Models Locally?

Cloud AI APIs are extremely powerful, but they are not always the best solution for every scenario.

Running models locally offers a few advantages that become very attractive in production environments.

No API Usage Costs

Most cloud AI providers charge based on tokens or requests.

That works well for prototypes. But once usage grows, costs can scale quickly.

Running models locally removes the per-request cost entirely, which makes a big difference for internal tools or heavy workloads.

Full Data Privacy

Many enterprise systems process sensitive information such as:

internal documentation
support tickets
logs
customer records

Sending this data to external AI APIs can raise security and compliance concerns.

Local models keep everything inside your infrastructure.

Lower Latency

Cloud inference requires a network round trip.

Local inference removes that dependency completely.

For internal assistants, dashboards, or developer tooling, this often results in noticeably faster responses.

Offline AI Capabilities

Local models can run without internet access.

This is useful in environments like:

secure enterprise networks
air-gapped systems
developer tools running locally

Ideal for Internal Tools

Local AI is especially useful for building tools such as:

internal chat assistants
log summarization tools
documentation search
developer copilots
AI-powered dashboards

This is exactly the kind of scenario where Ollama shines.

What is Ollama?

Ollama is a tool that allows developers to run LLMs (large language models) locally with minimal setup.

Instead of manually managing model weights, runtime environments, and inference servers, Ollama handles the heavy lifting.

It manages:

model downloads
model execution
memory handling
inference APIs

Once installed, Ollama exposes a local HTTP API.

That means any language capable of making HTTP requests can interact with it, including:

C#
Python
JavaScript
Go
Java

For backend developers, this makes integration extremely straightforward.

Supported Models

Ollama supports many popular open-source models, including:

Llama 3
Mistral
Gemma
Code Llama
various other community models

Different models are optimized for different tasks.

For example:

Model	Best Use Case
llama3	General AI tasks
mistral	Fast responses
codellama	Code generation
gemma	Lightweight inference

One of the biggest advantages of Ollama is how easy it is to switch between models.
Before choosing a model, it's worth understanding the hardware requirements involved in running these models locally.

Resource Considerations

Before running models locally, it is important to understand that LLMs still require system resources.

One thing you'll quickly notice when running local models is that inference latency can vary significantly depending on hardware.

During development, it’s common for responses to take several seconds when running on CPU-only machines.

For internal tools this is usually acceptable, but it’s worth keeping in mind when designing user-facing APIs.

For example:

Llama 3 8B models typically require 8–16GB RAM

CPU inference works but may be slower

GPUs significantly improve performance

For many internal tools, smaller or quantized models provide the best balance between performance and resource usage.

Architecture of a .NET Application Using Ollama

Before writing code, it helps to understand where Ollama fits into the architecture.

A typical integration looks like this:

Client
   ↓
ASP.NET Core API
   ↓
AI Service Layer
   ↓
Ollama Local API
   ↓
Local LLM Model

Request flow:

Client sends a request to the ASP.NET Core API
The API calls an AI service layer
The service sends a prompt to Ollama
Ollama runs the model locally
The generated response returns to the client

This separation keeps the architecture clean and maintainable.

Step 1: Installing Ollama

First, install Ollama on your machine.

Go to the official website:

https://ollama.com

Download the installer for your operating system. Ollama currently supports:

macOS
Linux
Windows

Run the installer and complete the setup.

After installation, open a terminal or command prompt and verify that Ollama is installed correctly by running:

ollama --version

If Ollama is installed properly, you should see the installed version printed in the terminal.

Download a Model

Next, download a language model that Ollama will run locally.

For this guide, we will use Llama 3.

Run the following command:

ollama pull llama3

This command downloads the model weights and prepares them for local inference.

Depending on your internet speed, the download may take a few minutes because LLM models are several gigabytes in size.

Run the Model

Once the model is downloaded, you can start it using:

ollama run llama3

Ollama will load the model and open an interactive prompt.

Try entering a simple question:

Explain what REST APIs are

If the model responds with an answer, your local AI environment is working correctly.

Local API Endpoint

Behind the scenes, Ollama also exposes an HTTP API that applications can call.

By default, the API runs at:

http://localhost:11434

This is the endpoint your ASP.NET Core application will communicate with.

Step 2: Creating a .NET 8 Web API

Next create a new ASP.NET Core API project.

dotnet new webapi -n LocalAIApi

A simple project structure might look like this:

Controllers
Services
Models
Program.cs

Keeping AI logic separated into services helps maintain clean architecture.

Step 3: Calling the Ollama API from .NET

Ollama exposes a simple endpoint for generating responses.

POST http://localhost:11434/api/generate

Example request payload:

{
  "model": "llama3",
  "prompt": "Explain dependency injection in ASP.NET Core",
  "stream": false
}

Example .NET Call

public async Task<string> GenerateAsync(
    string prompt,
    CancellationToken cancellationToken)
{
    var request = new
    {
        model = "llama3",
        prompt,
        stream = false
    };

    var response = await _httpClient.PostAsJsonAsync(
        "api/generate",
        request,
        cancellationToken);

    response.EnsureSuccessStatusCode();

    var result = await response.Content
        .ReadFromJsonAsync<OllamaResponse>(cancellationToken: cancellationToken);

    return result?.Response ?? string.Empty;
}

Response model:

public class OllamaResponse
{
    public string Response { get; set; }
}

Using a typed model instead of dynamic makes the code safer and easier to maintain.

Step 4: Creating an AI Service Layer

One design rule worth following:

Avoid putting AI logic directly inside controllers.

Instead, isolate it inside a service layer.

Service Interface

public interface IAiService
{
    Task<string> GenerateAsync(string prompt, CancellationToken cancellationToken);
}

Implementation

public class OllamaService : IAiService
{
    private readonly HttpClient _httpClient;

    public OllamaService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<string> GenerateAsync(
        string prompt,
        CancellationToken cancellationToken)
    {
        var request = new
        {
            model = "llama3",
            prompt,
            stream = false
        };

        var response = await _httpClient.PostAsJsonAsync(
            "api/generate",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content
            .ReadFromJsonAsync<OllamaResponse>(cancellationToken: cancellationToken);

        return result?.Response ?? string.Empty;
    }
}

Step 5: Registering the Service

In Program.cs:

builder.Services.AddHttpClient<IAiService, OllamaService>(client =>
{
    client.BaseAddress = new Uri("http://localhost:11434");
    client.Timeout = TimeSpan.FromMinutes(2);
});

Using HttpClientFactory ensures efficient connection management.

Step 6: Building an AI Endpoint

Now expose the AI functionality through a controller.

Request model:

public class AiRequest
{
    public string Prompt { get; set; }
}

Controller:

[ApiController]
[Route("api/ai")]
public class AiController : ControllerBase
{
    private readonly IAiService _aiService;

    public AiController(IAiService aiService)
    {
        _aiService = aiService;
    }

    [HttpPost("generate")]
    public async Task<IActionResult> Generate(
        AiRequest request,
        CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Prompt))
            return BadRequest("Prompt is required.");

        var result = await _aiService.GenerateAsync(
            request.Prompt,
            cancellationToken);

        return Ok(new { response = result });
    }
}

Your ASP.NET Core API now exposes a local AI-powered endpoint.

Step 7: Switching Models

One of the nicest things about Ollama is how easy it is to switch between models.

For example:

ollama pull mistral
ollama pull codellama

Then update the request:

{
  "model": "mistral"
}

In practice, testing a few models usually produces better results than simply choosing the largest one.

Step 8: Improving Performance

Local models can still be resource intensive.

A few practical optimizations can help significantly.

Use Streaming
Streaming responses improves perceived latency for longer outputs.

Reduce Prompt Size
Large prompts increase inference time.
Send only the context that the model truly needs.

Cache Repeated Requests
If prompts repeat frequently, caching responses can reduce compute usage.

Keep Calls Asynchronous
Always use async APIs when calling models to keep your backend scalable.

When Local Models Are Not the Best Choice

Local models are powerful, but they are not always the right solution.

For example, cloud AI services may still be better when:

you need extremely large models
you require massive scaling
GPU infrastructure is unavailable
inference workloads are very high

In practice, many teams use a hybrid approach, combining cloud models and local models depending on the use case.

Final Thoughts

The local AI ecosystem is moving incredibly fast right now.

Just a few years ago, running large language models required specialized machine learning environments. Today tools like Ollama make it possible for everyday backend developers to experiment with local LLMs using familiar technologies.

From a .NET perspective, integrating Ollama is actually much simpler than it looks at first.

Instead of relying entirely on external APIs, you can build AI-powered systems that are:

private
cost-efficient
low latency
fully controlled by your infrastructure

For internal tools, developer assistants, and AI-powered APIs, local models are quickly becoming a practical and powerful alternative to cloud AI services.

A Few Things Worth Remembering

Ollama makes running local LLMs simple
.NET applications can interact with Ollama using HTTP APIs
Use a dedicated AI service layer to keep architecture clean
Choose models based on your use case, not just size
Optimize prompts and responses for better performance

Fixing CORS Errors in ASP.NET Core APIs (The Real Reasons)

alinabi19 — Thu, 12 Mar 2026 11:38:04 +0000

If you've worked on APIs for a while, you've probably run into this situation.

Your frontend calls your ASP.NET Core API.

Everything looks correct.

But the browser console suddenly throws this error:

Access to fetch at 'https://api.example.com' from origin 'http://localhost:3000' has been blocked by CORS policy

So you try the exact same request in Postman.

It works perfectly.

Now you're confused.

You check the API.
You check the frontend.
Eventually you start adding random CORS settings hoping something finally works.

Before long, your Program.cs starts looking like a CORS experiment lab.

Almost every backend developer hits this problem at some point.

What makes CORS errors frustrating is that:

They only appear in browsers
API testing tools like Postman work fine
Small configuration mistakes can break everything
Middleware order matters in ASP.NET Core

After debugging this issue across multiple APIs, the same root causes show up again and again. Let’s walk through what’s actually happening and how to fix it properly.

What CORS Actually Is (In Simple Terms)

CORS stands for Cross-Origin Resource Sharing.

Browsers enforce a security rule called the Same-Origin Policy.

It means a web page can normally only call APIs from the same origin.

An origin is defined as:

Protocol + Domain + Port

For example:

http://localhost:3000
https://api.myapp.com
https://app.myapp.com

Even small differences create a different origin.

Example:

http://localhost:3000
http://localhost:5000

Same machine, different port - still considered cross-origin.

When a frontend application calls an API on another origin, the browser checks whether the API explicitly allows it.

If the response doesn't include the correct CORS headers, the browser blocks the request.

Important point many developers miss:

Your API is usually not rejecting the request.
The browser is refusing to expose the response.

This is also why the request works in Postman - Postman doesn't enforce browser security rules.

Why CORS Errors Happen in ASP.NET Core APIs

After debugging CORS issues in several projects, these are the most common causes.

Missing CORS middleware
CORS support was never enabled in the API.

Wrong middleware order
CORS middleware must run before endpoints execute.

Incorrect origin configuration
The frontend origin isn't included in allowed origins.

Preflight request failure
Browsers sometimes send an OPTIONS request before the real request.

If this fails, the real request never happens.

Credential configuration mistakes
Using cookies or authorization headers incorrectly.

Using AllowAnyOrigin() with credentials
Browsers block this combination for security reasons.

Step 1: The Correct Way to Enable CORS in ASP.NET Core

CORS configuration starts in Program.cs.

Register the CORS policy

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddCors(options =>
{
    options.AddPolicy("FrontendPolicy", policy =>
    {
        policy.WithOrigins("http://localhost:3000")
              .AllowAnyHeader()
              .AllowAnyMethod();
    });
});

This creates a named CORS policy.

Enable the middleware

var app = builder.Build();

app.UseCors("FrontendPolicy");

app.MapControllers();

app.Run();

Now requests from http://localhost:3000 will be allowed.

A common mistake is registering CORS but forgetting to enable the middleware.

Step 2: Understanding Preflight Requests (OPTIONS)

Sometimes browsers send a preflight request before the real request.

Example:

OPTIONS /api/orders

This usually happens when:

The request uses custom headers

The request method is PUT, PATCH, or DELETE

Credentials or authorization headers are used

The browser is essentially asking:

"Is it okay if I send this request?"

If the server doesn't respond with the correct CORS headers, the browser blocks the request before the real API call even happens.

The good news is that ASP.NET Core handles preflight requests automatically when the CORS middleware is configured correctly.

You normally don't need to manually implement OPTIONS endpoints.

Step 3: The Most Common CORS Configuration Mistakes

Mistake 1: Using `AllowAnyOrigin()` With Credentials

This is extremely common.

Example:

policy.AllowAnyOrigin()
      .AllowCredentials();

This will not work.

Browsers block this configuration because allowing credentials from any origin is a security risk.

Correct approach:

policy.WithOrigins("https://app.example.com")
      .AllowCredentials()
      .AllowAnyHeader()
      .AllowAnyMethod();

Always specify explicit origins when credentials are involved.

Mistake 2: Forgetting to Call `UseCors()`

Developers often configure CORS but forget to enable the middleware.

Wrong:

builder.Services.AddCors();

Correct:

app.UseCors("FrontendPolicy");

Without the middleware, the policy never runs.

Mistake 3: Wrong Middleware Order

Middleware order matters in ASP.NET Core.

Correct order:

app.UseRouting();

app.UseCors("FrontendPolicy");

app.UseAuthentication();
app.UseAuthorization();

app.MapControllers();

If CORS runs after endpoints, it won't apply to the response.

Step 4: Handling CORS for Frontend Frameworks

During development, frontend frameworks usually run on different ports.

Examples:

React  → http://localhost:3000
Vite   → http://localhost:5173
Angular → http://localhost:4200

Your CORS policy must allow these origins.

Example:

policy.WithOrigins(
    "http://localhost:3000",
    "http://localhost:5173",
    "http://localhost:4200"
)
.AllowAnyHeader()
.AllowAnyMethod();

For production environments, you should only allow trusted domains.

Example:

https://app.mycompany.com

Avoid using AllowAnyOrigin() in production APIs.

Step 5: Debugging CORS Errors Like a Pro

When a CORS error appears, guessing usually wastes time. It's better to check a few things first.

1. Check the Browser Console
Look for errors like:

No 'Access-Control-Allow-Origin' header present

This means the response didn't include the required CORS headers.

2. Inspect the Network Tab
Open DevTools and inspect the OPTIONS request.

If the preflight request fails, the real request will never be sent.

3. Compare Browser vs Postman
If Postman works but the browser fails, it's almost always a CORS issue.

4. Check Response Headers
You can also test the API response headers:

curl -I https://api.example.com

Look for:

Access-Control-Allow-Origin

If it's missing, the browser will block the request.

Step 6: Production Best Practices

A few simple practices make CORS configuration easier to manage in real systems.

Avoid Hardcoding Origins

Instead, load them from configuration.

Example:

var allowedOrigins = builder.Configuration
    .GetSection("Cors:AllowedOrigins")
    .Get<string[]>() ?? Array.Empty<string>();

policy.WithOrigins(allowedOrigins)
      .AllowAnyHeader()
      .AllowAnyMethod();

Configuration example:

"Cors": {
  "AllowedOrigins": [
    "http://localhost:3000",
    "https://app.mycompany.com"
  ]
}

This keeps deployments clean across development, staging, and production.

Be Aware of Reverse Proxies

In production environments your API may sit behind:

Nginx
Cloudflare
Azure Application Gateway
Kubernetes ingress

Sometimes these proxies strip or override headers, which can make CORS appear broken even when the API is configured correctly.

If CORS works locally but fails in production, checking the proxy configuration is often the next step.

Common Pitfall

One subtle issue happens when the frontend port changes.

Example:

Frontend running on:

localhost:5173

But the API only allows:

localhost:3000

Different port = different origin.

Even though it's the same machine, the browser treats it as cross-origin.

Quick CORS Debugging Checklist

Problem	Cause	Fix
Browser CORS error	Missing middleware	Add `AddCors()` and `UseCors()`
Preflight request fails	OPTIONS blocked	Enable CORS policy
Credentials error	Using `AllowAnyOrigin()`	Specify allowed origins
Works in Postman only	Browser enforcing CORS	Configure proper headers
Production requests blocked	Missing frontend domain	Add production origin

Final Thoughts

CORS errors can feel confusing at first because the problem usually shows up far away from the real cause.

The key thing to remember is this:

Browsers enforce CORS. Your API usually doesn't.

Most issues come down to a few predictable causes:

Missing CORS middleware
Wrong middleware order
Incorrect origin configuration
Preflight request failures

Once you understand how browsers handle cross-origin requests and how ASP.NET Core middleware works, debugging CORS becomes much more straightforward.

And if you've ever spent hours chasing a CORS error before a deployment, you're definitely not alone.

Building AI-Powered APIs with ASP.NET Core and OpenAI (.NET 8 Guide)

alinabi19 — Thu, 12 Mar 2026 07:43:16 +0000

AI features are slowly becoming part of normal backend work.

A few years ago, most APIs were simple CRUD endpoints. They fetched data, updated records, and returned JSON. Today it is common to see requests like:

“Can we add a chatbot endpoint?”
“Can the API summarize user feedback?”
“Can we classify support tickets automatically?”

At that point your backend suddenly needs to talk to an AI model.

If you're already comfortable building APIs with ASP.NET Core, the first instinct is usually simple: just call the OpenAI API from an endpoint and return the result.

That works for a quick prototype. But once the system starts growing, a few questions appear pretty quickly:

Where should the AI logic live?
Should controllers call OpenAI directly?
How do we protect API keys?
What happens if the AI call takes several seconds?
How do we deal with rate limits or retries?

AI integrations behave like any other external service dependency in your backend architecture. Treat them that way and the system stays clean and maintainable.

In this article we’ll build a simple AI-powered API using ASP.NET Core and OpenAI, while following patterns that hold up well in real applications.

Why Expose AI Through an API

Most production systems expose AI features through backend APIs rather than directly from the frontend.

A typical setup might look like this:

A web app calls an endpoint to generate content
A mobile app sends text to be summarized
An internal tool calls an API for classification

Centralizing AI logic inside your API gives you a few advantages.

Security
Your OpenAI key stays in the backend. The client never sees it.

Reuse
Multiple clients (web, mobile, internal tools) can use the same AI capability.

Cost control
Since AI calls cost money, the API layer can enforce limits and validation.

Consistency
Prompts, models, and safety rules live in one place.

Think of the API as the gateway between your application and AI services.

A Simple Architecture That Works Well

When adding AI to a backend, separating responsibilities helps a lot.

A typical flow looks like this:

Client
   ↓
API Endpoint
   ↓
AI Service
   ↓
OpenAI API

Each layer does a specific job.

Layer	Responsibility
API	Handles HTTP requests
Service Layer	Contains AI logic
HTTP Client	Calls OpenAI
Configuration	Stores API keys

A mistake I’ve seen more than once is calling OpenAI directly inside controllers.

That approach usually leads to:

duplicated logic
hard-to-test endpoints
messy controllers

Moving AI logic into a service keeps things cleaner and easier to maintain.

Step 1: Create the ASP.NET Core API

Start by creating a standard .NET 8 Web API project.

dotnet new webapi -n AiApiDemo

ASP.NET Core supports both controllers and Minimal APIs.

For simple AI endpoints, Minimal APIs work nicely because the code stays compact.

Example setup:

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

var app = builder.Build();

app.UseSwagger();
app.UseSwaggerUI();

app.Run();

Now we have a basic API ready to host endpoints.

Step 2: Store the OpenAI API Key

Never hardcode API keys directly in code.

A simple approach is to store it in configuration.

appsettings.json

{
  "OpenAI": {
    "ApiKey": "YOUR_API_KEY"
  }
}

In production environments you would usually use something like:

environment variables
Azure Key Vault
AWS Secrets Manager

The goal is simple: keep secrets outside source control.

Step 3: Register an HTTP Client

ASP.NET Core includes HttpClientFactory, which is the recommended way to make HTTP calls.

It helps avoid common issues like socket exhaustion and centralizes configuration.

builder.Services.AddHttpClient<IAiService, OpenAiService>(client =>
{
    client.BaseAddress = new Uri("https://api.openai.com/");
    client.Timeout = TimeSpan.FromSeconds(30);
});

Typed clients also work well with dependency injection.

Step 4: Create the AI Service

Instead of calling OpenAI from endpoints, create a dedicated service.

This keeps the API layer thin and makes the integration easier to test.

Service Interface

public interface IAiService
{
    Task<string> GenerateResponseAsync(
        string prompt,
        CancellationToken cancellationToken);
}

Service Implementation

public class OpenAiService : IAiService
{
    private readonly HttpClient _httpClient;
    private readonly IConfiguration _config;
    private readonly ILogger<OpenAiService> _logger;

    public OpenAiService(
        HttpClient httpClient,
        IConfiguration config,
        ILogger<OpenAiService> logger)
    {
        _httpClient = httpClient;
        _config = config;
        _logger = logger;
    }

    public async Task<string> GenerateResponseAsync(
        string prompt,
        CancellationToken cancellationToken)
    {
        var apiKey = _config["OpenAI:ApiKey"];

        _httpClient.DefaultRequestHeaders.Authorization =
            new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", apiKey);

        var request = new
        {
            model = "gpt-4o-mini",
            input = prompt
        };

        var response = await _httpClient.PostAsJsonAsync(
            "v1/responses",
            request,
            cancellationToken);

        if (!response.IsSuccessStatusCode)
        {
            var error = await response.Content.ReadAsStringAsync();

            _logger.LogError("OpenAI request failed: {Error}", error);

            throw new ApplicationException($"OpenAI request failed: {error}");
        }

        var result = await response.Content.ReadFromJsonAsync<OpenAiResponse>(cancellationToken);

        return result?.Output[0].Content[0].Text ?? "";
    }
}

Response Model

Using strongly typed models is safer than parsing dynamic JSON.

public class OpenAiResponse
{
    public List<Output> Output { get; set; }
}

public class Output
{
    public List<Content> Content { get; set; }
}

public class Content
{
    public string Text { get; set; }
}

Step 5: Create an AI Endpoint

Now we expose an endpoint that clients can call.

Request model:

public class ChatRequest
{
    public string Prompt { get; set; }
}

Minimal API endpoint:

app.MapPost("/api/ai/chat", async (
    ChatRequest request,
    IAiService aiService,
    CancellationToken cancellationToken) =>
{
    if (string.IsNullOrWhiteSpace(request.Prompt))
        return Results.BadRequest("Prompt cannot be empty.");

    if (request.Prompt.Length > 2000)
        return Results.BadRequest("Prompt too large.");

    var response = await aiService.GenerateResponseAsync(
        request.Prompt,
        cancellationToken);

    return Results.Ok(new { result = response });
});

Clients can now send prompts and receive AI-generated responses.

Handling Failures and Rate Limits

AI services are external dependencies, so failures are normal.

Some common issues include:

network timeouts
rate limits (429 responses)
temporary service errors

In production systems it’s usually worth adding retry policies.

Libraries like Polly integrate well with HttpClientFactory.

Example retry setup:

builder.Services.AddHttpClient<IAiService, OpenAiService>()
    .AddTransientHttpErrorPolicy(policy =>
        policy.WaitAndRetryAsync(3, retry =>
            TimeSpan.FromSeconds(Math.Pow(2, retry))));

This helps smooth over temporary failures.

A Few Performance Considerations

AI requests are typically slower than database queries.

A few small changes can improve responsiveness.

Use async calls
Blocking threads during AI calls will hurt scalability.

Validate prompt size
Large prompts increase both latency and cost.

Cache repeated responses
If users frequently ask the same question, caching results can reduce API calls.

Consider streaming responses
Streaming works well for chat-style applications where users expect gradual output.

Securing AI Endpoints

AI endpoints can easily become expensive if left unprotected.

A few safeguards help prevent abuse.

Authentication
Use JWT or API keys to restrict access.

Rate limiting
Limit how frequently a client can call the AI endpoint.

Prompt validation
Always validate user input before sending it to a model.

One Small Tip That Reduces AI Costs

AI pricing usually depends on the number of tokens processed.

Better prompts often produce better responses with fewer tokens.

Instead of sending large context blocks, try using structured prompts with clear instructions.

It improves both response quality and cost efficiency.

Lessons from Building AI APIs

If you're planning to add AI features to your backend, a few patterns make life easier:

Keep AI logic inside a dedicated service layer
Treat AI like any other external dependency
Add retries, validation, and logging early
Protect API keys and enforce usage limits
Monitor token usage to avoid unexpected costs

Once the architecture is set up properly, adding new AI capabilities becomes much simpler.

Chatbots, summarization, and classification all become just another API endpoint.

ASP.NET Core Request Pipeline Explained: What Happens When an API Receives a Request

alinabi19 — Wed, 11 Mar 2026 18:49:24 +0000

You send a request to an API endpoint.

Milliseconds later, a response comes back.

Most of the time, we don’t think much about what happens in between. We write controllers, configure middleware, run the application, and everything works.

Until it doesn’t.

Maybe authentication suddenly stops working.
Maybe a middleware behaves differently than expected.
Maybe performance drops under load.
Or routing starts sending requests to the wrong endpoint.

When that happens, the question becomes unavoidable:

What actually happens inside ASP.NET Core when a request hits your API?

Understanding the request pipeline is what turns ASP.NET Core from a black box into something you can actually debug and optimize.

In this article, we'll walk through the lifecycle of a request in ASP.NET Core—from the moment it reaches your server to the moment the response is sent back.

The Big Picture: The ASP.NET Core Request Flow

If you trace a request from the network all the way to your controller or endpoint, it roughly goes through this path:

Client
   ↓
Reverse Proxy (optional)
   ↓
Kestrel Web Server
   ↓
ASP.NET Core Hosting Layer
   ↓
Middleware Pipeline
   ↓
Endpoint Routing
   ↓
Endpoint Execution (Controller / Minimal API)
   ↓
Middleware (Response Flow)
   ↓
Kestrel
   ↓
Client Response

Each stage gets a chance to process the request before it reaches your application logic.

Once you understand this flow, debugging strange behavior becomes much easier.

Step 1: The Request Reaches Kestrel

The first component inside your application that receives the request is Kestrel.

Kestrel is the default high-performance web server used by ASP.NET Core. Its job is to:

Listen for incoming HTTP requests
Parse HTTP messages
Forward the request into the ASP.NET Core application pipeline

Kestrel is designed for high throughput and low latency. It uses asynchronous I/O and efficient networking primitives to handle thousands of concurrent connections.

In production environments, Kestrel usually sits behind a reverse proxy such as:

Nginx
Apache
IIS
Azure App Service infrastructure

The reverse proxy handles things like TLS termination, load balancing, and security filtering, while Kestrel still processes the request inside the application.

Once Kestrel receives the request, it passes it into the ASP.NET Core pipeline.

Step 2: The ASP.NET Core Hosting Layer

Before the request reaches middleware, ASP.NET Core’s hosting layer has already done some important work.

When the application starts, the hosting layer:

Builds the dependency injection container
Configures logging
Loads configuration
Constructs the middleware pipeline

This setup happens during application startup in Program.cs.

By the time a request arrives, the middleware pipeline has already been assembled and is ready to process incoming requests.

Step 3: The Request Enters the Middleware Pipeline

Most of the interesting work in ASP.NET Core happens inside the middleware pipeline.

Middleware are small components that can:

Inspect the request
Modify the request
Stop the request from continuing
Pass the request to the next component
Modify the response before it leaves

Middleware are configured in Program.cs.

Example:

app.Use(async (context, next) =>
{
    var logger = context.RequestServices
        .GetRequiredService<ILoggerFactory>()
        .CreateLogger("RequestLogger");

    logger.LogInformation("Request started: {Path}", context.Request.Path);

    await next();

    logger.LogInformation("Response finished with status {StatusCode}",
        context.Response.StatusCode);
});

Here’s what happens during execution:

The request enters the middleware
Code before await next() runs
The request moves to the next middleware
Eventually an endpoint executes
The response travels back through middleware
Code after await next() runs

This creates a two-way pipeline:

Request → Middleware → Endpoint
Response ← Middleware ← Endpoint

One thing that surprises many developers when debugging middleware is that responses travel back through the pipeline in reverse order.

Middleware Can Short-Circuit the Pipeline

Middleware can also stop the pipeline entirely.

For example:

app.Use(async (context, next) =>
{
    if (!context.User.Identity?.IsAuthenticated ?? true)
    {
        context.Response.StatusCode = StatusCodes.Status401Unauthorized;
        return;
    }

    await next();
});

In this case, the request never reaches later middleware or the endpoint.

This behavior is commonly used for:

authentication checks
rate limiting
request filtering

Step 4: Built-in Middleware Components

ASP.NET Core provides several built-in middleware components that most applications rely on.

Common examples include:

Routing Middleware
Determines which endpoint matches the request.

app.UseRouting();

Authentication Middleware
Validates the user identity.

app.UseAuthentication();

Authorization Middleware
Checks whether the authenticated user has permission.

app.UseAuthorization();

Exception Handling Middleware
Handles unhandled exceptions globally.

app.UseExceptionHandler();

*Other Common Production Middleware
*
Real-world APIs often include additional middleware such as:

CORS (UseCors)
Response compression (UseResponseCompression)
HTTPS redirection (UseHttpsRedirection)
Rate limiting (UseRateLimiter)

Each middleware adds a delegate to the request pipeline. Individually they’re lightweight, but extremely long middleware chains can introduce small overhead in very high-throughput systems.

Middleware Order Matters

One of the most common sources of bugs in ASP.NET Core applications is incorrect middleware ordering.

Consider this configuration:

app.UseAuthorization();
app.UseAuthentication();

This breaks authentication because authorization runs before the user identity is established.

The correct order is:

app.UseAuthentication();
app.UseAuthorization();

When debugging strange authentication behavior, middleware order is often the first thing worth checking.

Step 5: Endpoint Routing

After middleware processing, ASP.NET Core needs to determine which endpoint should handle the request.

This is handled by endpoint routing.

Routing examines:

HTTP method (GET, POST, etc.)
request path
route parameters

Example Minimal API:

app.MapGet("/products/{id}", (int id) =>
{
    return Results.Ok($"Product {id}");
});

If the request is:

GET /products/10

Routing selects this endpoint and prepares it for execution.

UseRouting() identifies the matching endpoint, while the endpoint delegate itself executes later in the pipeline.

ASP.NET Core’s routing system is highly optimized and capable of efficiently matching large numbers of routes.

Step 6: Endpoint Execution

Once routing selects the correct endpoint, ASP.NET Core executes the endpoint logic.

This could be:

a controller action
a minimal API handler
a Razor page
a gRPC service

For controller-based APIs, ASP.NET Core performs several additional steps automatically.

Model Binding

ASP.NET Core maps incoming request data into method parameters.

Example:

[HttpPost]
public IActionResult CreateOrder(OrderDto order)

Data can be bound from multiple sources:

request body
route values
query parameters
headers
form data

Validation

If validation attributes are used, ASP.NET Core validates the model automatically.

Example:

public class OrderDto
{
    [Required]
    public string CustomerEmail { get; set; }
}

Invalid models typically produce a 400 Bad Request response.

Business Logic

This is where your application code runs.

Typical tasks include:

database queries
calling services
performing calculations
invoking external APIs

Returning a Result

The endpoint returns a result such as:

return Ok(order);

ASP.NET Core then converts this result into an HTTP response.

For example:

objects → JSON
status codes → HTTP response codes
headers → HTTP headers

Step 7: The Response Travels Back Through Middleware

Once the endpoint finishes execution, the response begins its return journey.

The response flows back through the middleware pipeline in reverse order.

This allows middleware to:

modify response headers
compress responses
log execution time
transform output

Finally, the response reaches Kestrel, which sends it back to the client.

A Simple Performance Debugging Trick

When diagnosing slow requests, a small timing middleware can quickly identify bottlenecks.

Example:

app.Use(async (context, next) =>
{
    var stopwatch = Stopwatch.StartNew();

    await next();

    stopwatch.Stop();

    var logger = context.RequestServices
        .GetRequiredService<ILoggerFactory>()
        .CreateLogger("Performance");

    logger.LogInformation("Request completed in {Elapsed} ms",
        stopwatch.ElapsedMilliseconds);
});

This simple middleware can reveal slow endpoints or middleware components almost instantly.

Visual Summary of the Request Flow

Request Flow Summary

Client
  ↓
Kestrel
  ↓
Middleware Pipeline
  ↓
Routing
  ↓
Endpoint Execution
  ↓
Middleware (response)
  ↓
Client

Key Takeaways

ASP.NET Core processes requests through a middleware pipeline

Kestrel is the web server that receives HTTP requests

Middleware can inspect, modify, or terminate requests

Middleware order directly affects application behavior

Endpoint routing determines which API logic executes

Responses travel back through the same middleware pipeline

Once you understand this flow, ASP.NET Core stops feeling like a black box. Debugging becomes easier, middleware behavior makes more sense, and performance issues are much easier to track down.

Have you ever spent hours debugging an ASP.NET Core API only to realize the issue was caused by middleware order?

10 ASP.NET Core API Performance Mistakes That Hurt Scalability

alinabi19 — Wed, 11 Mar 2026 09:53:21 +0000

ASP.NET Core is one of the fastest web frameworks available today.

Benchmarks regularly show it outperforming many other platforms. Yet in real production systems, I’ve seen ASP.NET Core APIs struggle under load — even when the infrastructure was solid.

Response times that were 50–100ms during development suddenly climb to 800ms or more in production. CPU usage spikes, database calls slow down, and suddenly everyone is asking the same question:

“Why is our API so slow?”
In most cases, the problem isn’t ASP.NET Core.

It’s small design decisions made during development that quietly add overhead over time.

Things like:

Returning too much data
Inefficient database queries
Blocking threads
Missing caching
Large payloads

Individually these issues may seem harmless. Combined, they can dramatically reduce API performance and scalability.

After working on several production APIs, I’ve noticed the same performance issues appear again and again.

Let’s walk through 10 of the most common mistakes developers make when building ASP.NET Core APIs - and how to fix them.

Why API Performance Matters

API performance affects far more than just response time.

Slow APIs lead to:

Higher CPU and memory consumption
Increased cloud infrastructure costs
Poor scalability under load
Frustrated users waiting for responses

A well-designed ASP.NET Core API can handle thousands of requests per second with minimal infrastructure.

But that only happens when performance is considered early in the design.

1. Returning Too Much Data

One of the most common API performance issues is returning entire database entities instead of only the fields needed by the client.

Large payloads increase:

serialization time
network transfer time
client parsing time

Bad Example

app.MapGet("/users", async (AppDbContext db) =>
{
    return await db.Users.ToListAsync();
});

If your User table has 20 columns but the frontend only needs four, you're wasting bandwidth and compute.

Better Approach: Use DTO Projection

app.MapGet("/users", async (AppDbContext db, CancellationToken ct) =>
{
    return await db.Users
        .AsNoTracking()
        .Select(u => new UserDto
        {
            Id = u.Id,
            Name = u.Name,
            Email = u.Email
        })
        .ToListAsync(ct);
});

Benefits:

smaller SQL queries
less memory usage
faster serialization
smaller responses

This simple change can reduce payload sizes dramatically.

2. Blocking Threads Instead of Using Async

ASP.NET Core is designed for asynchronous I/O.

When database calls or external requests are made synchronously, threads become blocked while waiting for results.

Blocking Example

var users = db.Users.ToList();

Under load, blocked threads reduce throughput and can cause thread pool starvation.

Correct Approach

var users = await db.Users.ToListAsync(ct);

Async operations allow ASP.NET Core to:

free threads while waiting for I/O
handle more concurrent requests
scale efficiently

A common mistake I still see in production code is calling .Result or .Wait(). These should almost always be avoided in ASP.NET Core APIs.

3. Inefficient Database Queries

In most real-world APIs, the database is the primary performance bottleneck.

Common issues include:

N+1 queries
missing indexes
unnecessary joins
over-fetching data

N+1 Query Problem

var orders = await db.Orders.ToListAsync();

foreach (var order in orders)
{
    var items = await db.OrderItems
        .Where(i => i.OrderId == order.Id)
        .ToListAsync();
}

If there are 100 orders, this produces 101 database queries.

Better Approach

var orders = await db.Orders
    .Include(o => o.Items)
    .AsNoTracking()
    .ToListAsync(ct);

Always review your generated SQL queries. Small EF query mistakes can cause massive database load.

4. Ignoring Caching

If the same data is requested repeatedly, hitting the database every time is wasteful.

Caching can reduce response times dramatically.

In-Memory Cache Example

app.MapGet("/products", async (
    AppDbContext db,
    IMemoryCache cache,
    CancellationToken ct) =>
{
    if (!cache.TryGetValue("products", out List<Product> products))
    {
        products = await db.Products
            .AsNoTracking()
            .ToListAsync(ct);

        cache.Set("products", products,
            new MemoryCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5),
                SlidingExpiration = TimeSpan.FromMinutes(2)
            });
    }

    return products;
});

Caching is particularly effective for:

configuration data
product catalogs
reference tables

For multi-instance deployments, a distributed cache like Redis is usually a better choice.

5. Missing Pagination on Large Endpoints

Returning thousands of rows in a single API response can create serious performance issues.

Problems include:

large payload sizes
high memory consumption
slow serialization

Proper Pagination

app.MapGet("/orders", async (
    AppDbContext db,
    int page = 1,
    int pageSize = 20,
    CancellationToken ct) =>
{
    pageSize = Math.Min(pageSize, 100);

    return await db.Orders
        .AsNoTracking()
        .Skip((page - 1) * pageSize)
        .Take(pageSize)
        .ToListAsync(ct);
});

Always enforce a maximum page size to prevent abuse.

6. Overloading the Middleware Pipeline

Middleware runs on every request, so unnecessary middleware can add latency.

Common mistakes include:

heavy logging middleware
redundant request parsing
complex logic inside middleware

A clean pipeline might look like:

app.UseRouting();
app.UseAuthentication();
app.UseAuthorization();

Each additional middleware adds overhead, so keep the pipeline intentional and minimal.

7. Not Using Response Compression

Large JSON responses can significantly increase network latency.

ASP.NET Core provides built-in response compression.

Enable Compression

builder.Services.AddResponseCompression();

app.UseResponseCompression();

Compression is especially helpful for:

large JSON responses
mobile networks
APIs returning lists or datasets

8. Excessive Logging in Production

Logging is essential, but logging too much can hurt performance.

Common mistakes include:

logging entire request bodies
debug-level logs in production
logging inside loops

Better Logging Approach

logger.LogInformation(
    "Order created for user {UserId}",
    userId
);

Structured logging keeps logs useful while minimizing overhead.

9. Mismanaging Database Connections

Creating database connections is expensive, which is why connection pooling exists.

However, performance issues still occur when:

DbContext lifetimes are misconfigured

long-running queries hold connections

transactions stay open too long

Best practices:

use DbContext with scoped lifetime

avoid long transactions

keep queries efficient

10. Skipping Load Testing

Many APIs perform well during development but fail under real traffic.

Performance problems often appear only when:

hundreds of requests run concurrently
database contention increases
thread pools become saturated

Good load testing tools include:

k6
Apache JMeter
NBomber
Azure Load Testing

Testing under realistic traffic conditions helps reveal bottlenecks before production users experience them.

Pro Tip: Measure Before Optimizing

Optimization without measurement is mostly guesswork.

Useful performance tools include:

MiniProfiler
Application Insights
OpenTelemetry

These tools help identify the real bottleneck instead of optimizing blindly.

A Common Performance Trap

One mistake I frequently see is developers trying to optimize application code first while ignoring the database.

In many real systems, database queries account for 70–90% of total API response time.

Start your investigation there.

Key Takeaways

To keep ASP.NET Core APIs fast and scalable:

Return only the data clients actually need
Use asynchronous I/O consistently
Optimize database queries early
Cache frequently requested data
Implement pagination on large endpoints
Keep middleware pipelines lean
Enable response compression
Avoid excessive logging in production
Use proper DbContext lifetimes
Load test before production traffic

Performance rarely comes from one big optimization.

It usually comes from avoiding dozens of small mistakes that accumulate over time.

If you're building high-traffic APIs, these small decisions can make the difference between an API that struggles under load and one that scales effortlessly.

ASP.NET Core Caching Explained: In-Memory, Redis, and Response Caching for High-Performance APIs

alinabi19 — Wed, 04 Mar 2026 08:24:57 +0000

Modern APIs rarely fail because of logic.
They fail because of performance.

Picture this.

You deploy a clean ASP.NET Core API. Everything works perfectly during testing. But once real traffic arrives, things start getting ugly:

Database CPU spikes
Queries that took 30 ms now take 600 ms
Your API latency slowly creeps past 1 second

The problem usually isn’t the database itself.

It’s that your API is doing the same expensive work repeatedly.

The same product list.
The same configuration settings.
The same reference data.

Over and over again.

This is where caching becomes one of the most powerful tools in backend engineering.

Yet many developers either:

Avoid caching because it feels complicated
Or implement it incorrectly and create stale data bugs

In this guide, you'll learn how caching actually works in ASP.NET Core, when to use each type, and how to implement it properly in production APIs.

By the end, you'll know how to:

Improve API performance dramatically
Reduce database load
Build APIs that scale without infrastructure explosions

Let’s start with the fundamentals.

Why API Performance Matters

API performance directly impacts:

- User experience
- Infrastructure cost
- Scalability
- System stability

Consider this example:

Without caching:

10,000 requests/min
→ Each request hits the database
→ 10,000 database queries/min

With caching:

10,000 requests/min
→ Only 200 database queries
→ Remaining requests served from cache

The result:

Faster responses
Lower DB load
Better scalability

Caching is often the highest ROI optimization you can implement.

What Is Caching

Caching means storing expensive data temporarily so it can be reused.

Instead of recomputing or querying the database repeatedly, the API retrieves the result from a fast storage layer.

Typical cache flow:

Request arrives
      ↓
Check Cache
      ↓
Cache Hit → return data immediately
Cache Miss → fetch from DB → store in cache → return

The key idea:

Cache the result of expensive operations, not everything.

Types of Caching in ASP.NET Core

ASP.NET Core provides several caching mechanisms:

1. In-Memory Caching
2. Distributed Caching
3. Response Caching

Each solves a different problem.

In-Memory Caching

In-memory caching stores data inside the application server's memory.

It is:

Extremely fast
Easy to implement
Best for single-instance APIs

However, it has a limitation.

If you run multiple API instances (load balancing), each instance has its own separate cache.

Use cases:

Product catalogs
Configuration values
Reference data

Distributed Caching (Redis / SQL Server)

Distributed caching stores data in external cache systems like:

Redis
SQL Server
NCache

All application instances share the same cache.

Benefits:

Works with multiple API instances
Ideal for cloud and microservices architectures
Handles large-scale traffic

This is the standard choice for production systems.

Response Caching

Response caching stores entire HTTP responses.

Instead of executing the controller again, ASP.NET Core returns the cached HTTP response.

This is useful for:

Public APIs
GET endpoints
Static data endpoints

However, it only works when responses are safe to cache.

When to Use Each Type

Cache Type	Best Use Case	Speed	Scalability
In-Memory Cache	Single instance apps	Very fast	Low
Distributed Cache	Multi-server production APIs	Fast	High
Response Cache	Cache entire HTTP responses	Very fast	Medium

Rule of thumb:

Small apps → In-Memory
Scalable APIs → Redis Distributed Cache
Static GET responses → Response Caching

Step-by-Step Implementation

Let's implement each caching strategy.

In-Memory Cache Example

First, register the memory cache service.

Program.cs (.NET 8)

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddMemoryCache();
builder.Services.AddControllers();

var app = builder.Build();

app.MapControllers();
app.Run();

Using IMemoryCache

[ApiController]
[Route("api/products")]
public class ProductsController : ControllerBase
{
    private readonly IMemoryCache _cache;

    public ProductsController(IMemoryCache cache)
    {
        _cache = cache;
    }

    [HttpGet]
    public async Task<IActionResult> GetProducts()
    {
        const string cacheKey = "product_list";

        if (!_cache.TryGetValue(cacheKey, out List<string> products))
        {
            // Simulate expensive DB call
            await Task.Delay(500);

            products = new List<string>
            {
                "Laptop",
                "Keyboard",
                "Mouse"
            };

            var cacheOptions = new MemoryCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10),
                SlidingExpiration = TimeSpan.FromMinutes(2)
            };

            _cache.Set(cacheKey, products, cacheOptions);
        }

        return Ok(products);
    }
}

Expiration Strategies

Absolute Expiration

Cache expires after a fixed time.

AbsoluteExpirationRelativeToNow = 10 minutes

Sliding Expiration

Expiration resets every time the cache is accessed.

SlidingExpiration = 2 minutes

Combining both prevents stale data.

Cache Invalidation Example

Cache invalidation is crucial when data changes.

[HttpPost]
public IActionResult AddProduct(string name)
{
    // Save to database

    _cache.Remove("product_list");

    return Ok();
}

This ensures fresh data is fetched on the next request.

Redis Distributed Cache Example

First install the package:

Microsoft.Extensions.Caching.StackExchangeRedis

Configure Redis

builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = "localhost:6379";
    options.InstanceName = "MyApiCache";
});

Using IDistributedCache

public class ProductsController : ControllerBase
{
    private readonly IDistributedCache _cache;

    public ProductsController(IDistributedCache cache)
    {
        _cache = cache;
    }

    [HttpGet("redis")]
    public async Task<IActionResult> GetProductsRedis()
    {
        var cacheKey = "products";

        var cachedData = await _cache.GetStringAsync(cacheKey);

        if (cachedData != null)
        {
            return Ok(JsonSerializer.Deserialize<List<string>>(cachedData));
        }

        var products = new List<string>
        {
            "Laptop",
            "Keyboard",
            "Mouse"
        };

        var options = new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
        };

        await _cache.SetStringAsync(
            cacheKey,
            JsonSerializer.Serialize(products),
            options
        );

        return Ok(products);
    }
}

Now all API instances share the same cache.

Response Caching Example

First enable response caching.

Program.cs

builder.Services.AddResponseCaching();

app.UseResponseCaching();

Controller Example

[HttpGet("catalog")]
[ResponseCache(Duration = 60)]
public IActionResult GetCatalog()
{
    var data = new
    {
        Message = "Cached Response",
        Time = DateTime.UtcNow
    };

    return Ok(data);
}

Now responses are cached for 60 seconds.

Production Insight: Cache Stampede

A common issue in high-traffic systems is cache stampede.

When a cache entry expires, many concurrent requests may attempt to recompute the same value simultaneously.

This can overwhelm your database.

Typical mitigation strategies include:

Using SemaphoreSlim locking
Using Lazy<T> caching
Refreshing cache in the background
Using stale-while-revalidate patterns

Without protection, a popular endpoint can trigger thousands of database queries the moment a cache expires.

Real-World Use Cases

Caching shines in scenarios like:

Product catalog APIs

Product data changes rarely but is requested constantly.

Configuration endpoints

Feature flags, settings, metadata.

Dashboard APIs

Expensive aggregations or analytics queries.

Reference data

Countries, currencies, categories.

Common Mistakes Developers Make

Caching everything

Not all data should be cached. Only cache expensive operations.

Forgetting cache invalidation

Stale data bugs happen when cache isn't cleared after updates.

Using in-memory cache in scaled systems

Multiple servers = multiple caches = inconsistent data.

Using very long expiration times

Leads to stale responses.

Performance & Scalability Considerations

When designing caching strategies:

Think about:

- Cache eviction policies
- Memory consumption
- Cache hit ratio
- Invalidation strategy

Monitor:

Cache hits vs misses
Redis latency
Database query reductions

A good cache system should dramatically reduce database load.

Best Practices for Production APIs

Follow these principles:

Cache read-heavy endpoints
Use Redis for distributed systems
Always define expiration policies
Implement cache invalidation
Monitor cache performance

Pro Tip

Cache DTO results, not raw entities.

This avoids serialization overhead and prevents accidental cache mutation.

Common Pitfall

Never cache user-specific data globally.

Example:

GET /api/orders

If cached improperly, users might receive other users’ data.

Always include user context in cache keys when needed.

Why Caching Is a Game Changer for APIs

Caching is one of the simplest ways to improve API performance.

A well-designed caching layer can:

Reduce database load dramatically
Improve API latency
Increase scalability

And the best part?

You often get 10x performance improvements with minimal code changes.

Start simple:

Add in-memory caching
Move to Redis when scaling
Use response caching for public endpoints

Small optimizations like these are what separate average APIs from high-performance systems.

Quick Recap

Caching reduces repeated expensive operations
ASP.NET Core supports multiple caching strategies
Redis enables scalable distributed caching
Cache invalidation is critical
Always implement expiration policies

One Question for You

What caching strategy are you currently using
in your ASP.NET Core APIs?

In-memory? Redis? Something else?

I'd love to hear your experience.

DEV Community: alinabi19

Running Local AI Models in .NET with Ollama (Step-by-Step Guide)

Why Run AI Models Locally?

No API Usage Costs

Full Data Privacy

Lower Latency

Offline AI Capabilities

Ideal for Internal Tools

What is Ollama?

Supported Models

Resource Considerations

Architecture of a .NET Application Using Ollama

Step 1: Installing Ollama

Download a Model

Run the Model

Local API Endpoint

Step 2: Creating a .NET 8 Web API

Step 3: Calling the Ollama API from .NET

Example .NET Call

Step 4: Creating an AI Service Layer

Service Interface

Implementation

Step 5: Registering the Service

Step 6: Building an AI Endpoint

Step 7: Switching Models

Step 8: Improving Performance

When Local Models Are Not the Best Choice

Final Thoughts

A Few Things Worth Remembering

Fixing CORS Errors in ASP.NET Core APIs (The Real Reasons)

What CORS Actually Is (In Simple Terms)

Why CORS Errors Happen in ASP.NET Core APIs

Step 1: The Correct Way to Enable CORS in ASP.NET Core

Step 2: Understanding Preflight Requests (OPTIONS)

Step 3: The Most Common CORS Configuration Mistakes

Mistake 1: Using AllowAnyOrigin() With Credentials

Mistake 2: Forgetting to Call UseCors()

Mistake 3: Wrong Middleware Order

Step 4: Handling CORS for Frontend Frameworks

Step 5: Debugging CORS Errors Like a Pro

Step 6: Production Best Practices

Common Pitfall

Quick CORS Debugging Checklist

Final Thoughts

Building AI-Powered APIs with ASP.NET Core and OpenAI (.NET 8 Guide)

Why Expose AI Through an API

A Simple Architecture That Works Well

Step 1: Create the ASP.NET Core API

Step 2: Store the OpenAI API Key

Step 3: Register an HTTP Client

Step 4: Create the AI Service

Step 5: Create an AI Endpoint

Handling Failures and Rate Limits

A Few Performance Considerations

Securing AI Endpoints

One Small Tip That Reduces AI Costs

Lessons from Building AI APIs

ASP.NET Core Request Pipeline Explained: What Happens When an API Receives a Request

The Big Picture: The ASP.NET Core Request Flow

Step 1: The Request Reaches Kestrel

Step 2: The ASP.NET Core Hosting Layer

Step 3: The Request Enters the Middleware Pipeline

Middleware Can Short-Circuit the Pipeline

Step 4: Built-in Middleware Components

Middleware Order Matters

Step 5: Endpoint Routing

Step 6: Endpoint Execution

Model Binding

Validation

Business Logic

Returning a Result

Step 7: The Response Travels Back Through Middleware

A Simple Performance Debugging Trick

Visual Summary of the Request Flow

Key Takeaways

10 ASP.NET Core API Performance Mistakes That Hurt Scalability

Why API Performance Matters

1. Returning Too Much Data

Better Approach: Use DTO Projection

2. Blocking Threads Instead of Using Async

Mistake 1: Using `AllowAnyOrigin()` With Credentials

Mistake 2: Forgetting to Call `UseCors()`