Codexlancers

Posted on Jul 3

Running Local AI Models in .NET with Ollama (Step-by-Step Guide)

#ai #dotnet #csharp #machinelearning

Build AI-powered ASP.NET Core applications without relying entirely on cloud providers.

Introduction

If you've been experimenting with AI recently, your journey probably looked something like this:

Sign up for an AI provider
Generate an API key
Send a prompt
Get a response

It's simple, fast, and honestly, it's amazing how quickly you can build something useful.

But once AI moves beyond a proof of concept, a different set of questions starts to appear.

How much will this cost at scale?
Do we really want sensitive data leaving our infrastructure?
What happens if we hit API limits?
Do we want our internal tools to depend entirely on an external service?

This is exactly where local AI becomes interesting.

Recently, I started experimenting with Ollama and was surprised by how easy it has become to run AI models locally and integrate them into a regular ASP.NET Core application.

The best part?

From a .NET developer's perspective, it mostly feels like integrating another HTTP service.

In this guide, we'll:

Install Ollama
Download and run a local model
Integrate it into ASP.NET Core
Build a simple AI-powered endpoint

Why Run AI Models Locally?

Cloud AI is fantastic.

I'm not here to replace it.

But local AI solves a different set of problems.

Better Privacy

Many business applications work with sensitive information.

Things like:

Internal documentation
Support tickets
Customer records
Application logs

Sending that information to an external AI provider may introduce security, compliance, or governance concerns.

Running models locally keeps everything inside your own infrastructure.

Predictable Costs

Cloud providers charge based on usage.

That works well initially, but costs can grow quickly as applications scale.

Local AI removes the per-request pricing model entirely.

Fewer Dependencies

Your application no longer depends on:

Internet connectivity
External outages
API rate limits

Everything runs on your own machine or server.

What Is Ollama?

Ollama is a tool that allows developers to run Large Language Models (LLMs) locally.

Instead of manually configuring machine learning environments, Ollama handles:

Model downloads
Runtime management
Memory handling
HTTP APIs

Once installed, interacting with AI becomes as simple as making an HTTP request.

Popular models include:

Model	Best Use Case
llama3	General AI tasks
mistral	Fast responses
codellama	Code generation
gemma	Lightweight applications

Step 1: Install Ollama

Download Ollama from:

https://ollama.com

Verify the installation.

ollama --version

Download a model.

ollama pull llama3

Run it.

ollama run llama3

Ask it something simple.

Explain dependency injection in ASP.NET Core

If you get a response, you're ready to go.

By default, Ollama exposes a local API at:

http://localhost:11434

This is what our ASP.NET Core application will communicate with.

Step 2: Create an ASP.NET Core API

Create a new project.

dotnet new webapi -n LocalAIApi

Recommended structure:

Controllers
Services
Models
Program.cs

Keep your AI logic separate from controllers.

It'll make your application easier to maintain.

Step 3: Register HttpClient

Inside Program.cs:

builder.Services.AddHttpClient<IAiService, OllamaService>(client =>
{
    client.BaseAddress = new Uri("http://localhost:11434");

    client.Timeout = TimeSpan.FromMinutes(2);
});

Step 4: Create an AI Service

Interface:

public interface IAiService
{
    Task<string> GenerateAsync(
        string prompt,
        CancellationToken cancellationToken);
}

Implementation:

public class OllamaService : IAiService
{
    private readonly HttpClient _httpClient;

    public OllamaService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<string> GenerateAsync(
        string prompt,
        CancellationToken cancellationToken)
    {
        var request = new
        {
            model = "llama3",
            prompt,
            stream = false
        };

        var response = await _httpClient.PostAsJsonAsync(
            "api/generate",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content
            .ReadFromJsonAsync<OllamaResponse>(
                cancellationToken: cancellationToken);

        return result?.Response ?? string.Empty;
    }
}

Response model:

public class OllamaResponse
{
    public string Response { get; set; } = string.Empty;
}

At this point, AI integration starts looking very similar to integrating any third-party API.

That's probably the biggest surprise when working with Ollama for the first time.

Most of the complexity is already handled for you.

Step 5: Expose an AI Endpoint

Request model:

public class AiRequest
{
    public string Prompt { get; set; } = string.Empty;
}

Controller:

[ApiController]
[Route("api/ai")]
public class AiController : ControllerBase
{
    private readonly IAiService _aiService;

    public AiController(IAiService aiService)
    {
        _aiService = aiService;
    }

    [HttpPost("generate")]
    public async Task<IActionResult> Generate(
        AiRequest request,
        CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Prompt))
        {
            return BadRequest("Prompt is required.");
        }

        var result = await _aiService.GenerateAsync(
            request.Prompt,
            cancellationToken);

        return Ok(new
        {
            response = result
        });
    }
}

Your API can now expose AI functionality without relying on a cloud provider.

Architecture Overview

Client
   ↓
ASP.NET Core API
   ↓
AI Service Layer
   ↓
Ollama HTTP API
   ↓
Local LLM

Keeping AI behind a dedicated service layer makes it easier to swap providers later if needed.

Is Local AI Always Better?

No.

Cloud AI is still the better choice in many scenarios.

For example:

Use cloud AI when:

You need state-of-the-art reasoning models
You have thousands of concurrent users
You need globally distributed infrastructure

Use local AI when:

Privacy matters
Cost control matters
You're building internal business tools

In reality, many teams will likely use a hybrid approach.

Final Thoughts

A few years ago, running large language models felt like something only machine learning engineers could do.

Today, tools like Ollama have changed that.

As a .NET developer, integrating local AI is often just another HTTP integration.

And that's what makes this so exciting.

Local AI isn't replacing cloud AI.

It's simply giving developers another architectural option.

For internal tools, private assistants, documentation search, and AI-powered business applications, that option is becoming increasingly practical every day.

DEV Community