DEV Community

Cover image for Running Local AI Models in .NET with Ollama (Step-by-Step Guide)
Codexlancers
Codexlancers

Posted on

Running Local AI Models in .NET with Ollama (Step-by-Step Guide)

Build AI-powered ASP.NET Core applications without relying entirely on cloud providers.

Introduction

If you've been experimenting with AI recently, your journey probably looked something like this:

  1. Sign up for an AI provider
  2. Generate an API key
  3. Send a prompt
  4. Get a response

It's simple, fast, and honestly, it's amazing how quickly you can build something useful.

But once AI moves beyond a proof of concept, a different set of questions starts to appear.

  • How much will this cost at scale?
  • Do we really want sensitive data leaving our infrastructure?
  • What happens if we hit API limits?
  • Do we want our internal tools to depend entirely on an external service?

This is exactly where local AI becomes interesting.

Recently, I started experimenting with Ollama and was surprised by how easy it has become to run AI models locally and integrate them into a regular ASP.NET Core application.

The best part?

From a .NET developer's perspective, it mostly feels like integrating another HTTP service.

In this guide, we'll:

  • Install Ollama
  • Download and run a local model
  • Integrate it into ASP.NET Core
  • Build a simple AI-powered endpoint

Why Run AI Models Locally?

Cloud AI is fantastic.

I'm not here to replace it.

But local AI solves a different set of problems.

Better Privacy

Many business applications work with sensitive information.

Things like:

  • Internal documentation
  • Support tickets
  • Customer records
  • Application logs

Sending that information to an external AI provider may introduce security, compliance, or governance concerns.

Running models locally keeps everything inside your own infrastructure.

Predictable Costs

Cloud providers charge based on usage.

That works well initially, but costs can grow quickly as applications scale.

Local AI removes the per-request pricing model entirely.

Fewer Dependencies

Your application no longer depends on:

  • Internet connectivity
  • External outages
  • API rate limits

Everything runs on your own machine or server.


What Is Ollama?

Ollama is a tool that allows developers to run Large Language Models (LLMs) locally.

Instead of manually configuring machine learning environments, Ollama handles:

  • Model downloads
  • Runtime management
  • Memory handling
  • HTTP APIs

Once installed, interacting with AI becomes as simple as making an HTTP request.

Popular models include:

Model Best Use Case
llama3 General AI tasks
mistral Fast responses
codellama Code generation
gemma Lightweight applications

Step 1: Install Ollama

Download Ollama from:

https://ollama.com
Enter fullscreen mode Exit fullscreen mode

Verify the installation.

ollama --version
Enter fullscreen mode Exit fullscreen mode

Download a model.

ollama pull llama3
Enter fullscreen mode Exit fullscreen mode

Run it.

ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Ask it something simple.

Explain dependency injection in ASP.NET Core
Enter fullscreen mode Exit fullscreen mode

If you get a response, you're ready to go.

By default, Ollama exposes a local API at:

http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

This is what our ASP.NET Core application will communicate with.


Step 2: Create an ASP.NET Core API

Create a new project.

dotnet new webapi -n LocalAIApi
Enter fullscreen mode Exit fullscreen mode

Recommended structure:

Controllers
Services
Models
Program.cs
Enter fullscreen mode Exit fullscreen mode

Keep your AI logic separate from controllers.

It'll make your application easier to maintain.


Step 3: Register HttpClient

Inside Program.cs:

builder.Services.AddHttpClient<IAiService, OllamaService>(client =>
{
    client.BaseAddress = new Uri("http://localhost:11434");

    client.Timeout = TimeSpan.FromMinutes(2);
});
Enter fullscreen mode Exit fullscreen mode

Step 4: Create an AI Service

Interface:

public interface IAiService
{
    Task<string> GenerateAsync(
        string prompt,
        CancellationToken cancellationToken);
}
Enter fullscreen mode Exit fullscreen mode

Implementation:

public class OllamaService : IAiService
{
    private readonly HttpClient _httpClient;

    public OllamaService(HttpClient httpClient)
    {
        _httpClient = httpClient;
    }

    public async Task<string> GenerateAsync(
        string prompt,
        CancellationToken cancellationToken)
    {
        var request = new
        {
            model = "llama3",
            prompt,
            stream = false
        };

        var response = await _httpClient.PostAsJsonAsync(
            "api/generate",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content
            .ReadFromJsonAsync<OllamaResponse>(
                cancellationToken: cancellationToken);

        return result?.Response ?? string.Empty;
    }
}
Enter fullscreen mode Exit fullscreen mode

Response model:

public class OllamaResponse
{
    public string Response { get; set; } = string.Empty;
}
Enter fullscreen mode Exit fullscreen mode

At this point, AI integration starts looking very similar to integrating any third-party API.

That's probably the biggest surprise when working with Ollama for the first time.

Most of the complexity is already handled for you.


Step 5: Expose an AI Endpoint

Request model:

public class AiRequest
{
    public string Prompt { get; set; } = string.Empty;
}
Enter fullscreen mode Exit fullscreen mode

Controller:

[ApiController]
[Route("api/ai")]
public class AiController : ControllerBase
{
    private readonly IAiService _aiService;

    public AiController(IAiService aiService)
    {
        _aiService = aiService;
    }

    [HttpPost("generate")]
    public async Task<IActionResult> Generate(
        AiRequest request,
        CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Prompt))
        {
            return BadRequest("Prompt is required.");
        }

        var result = await _aiService.GenerateAsync(
            request.Prompt,
            cancellationToken);

        return Ok(new
        {
            response = result
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Your API can now expose AI functionality without relying on a cloud provider.


Architecture Overview

Client
   ↓
ASP.NET Core API
   ↓
AI Service Layer
   ↓
Ollama HTTP API
   ↓
Local LLM
Enter fullscreen mode Exit fullscreen mode

Keeping AI behind a dedicated service layer makes it easier to swap providers later if needed.


Is Local AI Always Better?

No.

Cloud AI is still the better choice in many scenarios.

For example:

Use cloud AI when:

  • You need state-of-the-art reasoning models
  • You have thousands of concurrent users
  • You need globally distributed infrastructure

Use local AI when:

  • Privacy matters
  • Cost control matters
  • You're building internal business tools

In reality, many teams will likely use a hybrid approach.


Final Thoughts

A few years ago, running large language models felt like something only machine learning engineers could do.

Today, tools like Ollama have changed that.

As a .NET developer, integrating local AI is often just another HTTP integration.

And that's what makes this so exciting.

Local AI isn't replacing cloud AI.

It's simply giving developers another architectural option.

For internal tools, private assistants, documentation search, and AI-powered business applications, that option is becoming increasingly practical every day.

Top comments (0)