Paul Robertson

Posted on Feb 10

Create a Local AI Chatbot with Ollama: No API Keys Required

#ai #python #beginners #tutorial

This article contains affiliate links. I may earn a commission at no extra cost to you.

title: "Create a Local AI Chatbot with Ollama: No API Keys Required"
published: true
description: "Build your own AI chatbot that runs entirely on your machine - no cloud dependencies, no API costs, no data sharing concerns."
tags: ai, ollama, python, tutorial, beginners

cover_image:

Tired of API rate limits, monthly bills, and sending your data to third-party services? What if I told you that you could run a capable AI chatbot entirely on your own machine - for free?

With Ollama, you can download and run open-source language models locally. No internet required after setup, no usage limits, and complete privacy. In this tutorial, we'll build a functional chatbot from scratch using Python and Ollama.

Why Choose Local AI?

Before diving in, let's understand when local AI makes sense:

Advantages:

🔒 Complete privacy - your data never leaves your machine
💰 No ongoing costs after initial setup
🚀 No rate limits or API quotas
🌐 Works offline once models are downloaded
🎛️ Full control over model behavior

Trade-offs:

Requires decent hardware (8GB+ RAM recommended)
Smaller models may be less capable than GPT-4
Initial setup and model downloads take time

Step 1: Installing Ollama

Ollama makes running local language models surprisingly simple. Let's get it installed:

macOS

brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the installer from ollama.ai and run it.

After installation, start the Ollama service:

ollama serve

This starts a local server on http://localhost:11434 that we'll use to communicate with our models.

Step 2: Download a Language Model

Let's start with Llama 2 7B - it's lightweight enough for most machines while still being quite capable:

ollama pull llama2:7b

This downloads about 3.8GB, so grab a coffee! You can also try other models:

llama2:13b - More capable but requires more RAM
codellama:7b - Specialized for code generation
mistral:7b - Fast and efficient alternative

Test your model:

ollama run llama2:7b

You should see a chat interface. Type a message and watch your local AI respond! Press Ctrl+D to exit.

Step 3: Building a Python Chat Interface

Now let's create a proper chat interface. First, install the requests library:

pip install requests

Create chatbot.py:

import requests
import json

class LocalChatbot:
    def __init__(self, model_name="llama2:7b"):
        self.model_name = model_name
        self.base_url = "http://localhost:11434"
        self.conversation_history = []

    def chat(self, message):
        """Send a message to the local AI model"""
        url = f"{self.base_url}/api/generate"

        payload = {
            "model": self.model_name,
            "prompt": message,
            "stream": False
        }

        try:
            response = requests.post(url, json=payload)
            response.raise_for_status()

            result = response.json()
            return result.get('response', 'No response received')

        except requests.exceptions.RequestException as e:
            return f"Error communicating with Ollama: {e}"

    def interactive_chat(self):
        """Start an interactive chat session"""
        print("🤖 Local AI Chatbot (type 'quit' to exit)")
        print("-" * 40)

        while True:
            user_input = input("You: ").strip()

            if user_input.lower() in ['quit', 'exit', 'bye']:
                print("👋 Goodbye!")
                break

            if not user_input:
                continue

            print("🤖 Thinking...")
            response = self.chat(user_input)
            print(f"AI: {response}\n")

if __name__ == "__main__":
    bot = LocalChatbot()
    bot.interactive_chat()

Run your chatbot:

python chatbot.py

Step 4: Adding Conversation Memory

Right now, our bot has no memory of previous messages. Let's fix that:

class LocalChatbot:
    def __init__(self, model_name="llama2:7b"):
        self.model_name = model_name
        self.base_url = "http://localhost:11434"
        self.conversation_history = []

    def build_context(self, new_message):
        """Build conversation context from history"""
        context = "Previous conversation:\n"

        # Include last 5 exchanges to avoid token limits
        recent_history = self.conversation_history[-10:]

        for entry in recent_history:
            context += f"Human: {entry['human']}\n"
            context += f"Assistant: {entry['ai']}\n\n"

        context += f"Human: {new_message}\n"
        context += "Assistant: "

        return context

    def chat(self, message):
        """Send a message with conversation context"""
        url = f"{self.base_url}/api/generate"

        # Build context if we have conversation history
        if self.conversation_history:
            prompt = self.build_context(message)
        else:
            prompt = message

        payload = {
            "model": self.model_name,
            "prompt": prompt,
            "stream": False
        }

        try:
            response = requests.post(url, json=payload)
            response.raise_for_status()

            result = response.json()
            ai_response = result.get('response', 'No response received')

            # Store in conversation history
            self.conversation_history.append({
                'human': message,
                'ai': ai_response
            })

            return ai_response

        except requests.exceptions.RequestException as e:
            return f"Error: {e}"

Step 5: Basic Prompt Engineering

Let's add some personality and improve responses with better prompting:

class LocalChatbot:
    def __init__(self, model_name="llama2:7b", system_prompt=None):
        self.model_name = model_name
        self.base_url = "http://localhost:11434"
        self.conversation_history = []

        # Default system prompt
        self.system_prompt = system_prompt or """
You are a helpful, friendly AI assistant. You provide clear, concise answers 
and ask follow-up questions when appropriate. You admit when you don't know 
something rather than making things up.
"""

    def build_context(self, new_message):
        """Build conversation context with system prompt"""
        context = self.system_prompt + "\n\n"

        # Include recent conversation history
        recent_history = self.conversation_history[-8:]

        for entry in recent_history:
            context += f"Human: {entry['human']}\n"
            context += f"Assistant: {entry['ai']}\n\n"

        context += f"Human: {new_message}\n"
        context += "Assistant: "

        return context

You can customize the system prompt for different use cases:

# Code assistant
code_prompt = """
You are an expert programming assistant. Provide clear, working code examples 
with explanations. Always include error handling where appropriate.
"""

# Creative writing helper
creative_prompt = """
You are a creative writing assistant. Help users brainstorm ideas, improve 
their writing, and overcome writer's block with encouraging, constructive feedback.
"""

bot = LocalChatbot(system_prompt=code_prompt)

Step 6: Performance Tips

To get the best performance from your local setup:

Model Selection: Start with 7B models, upgrade to 13B if you have 16GB+ RAM
Hardware: Use GPU acceleration if available (Ollama supports NVIDIA GPUs)
Context Management: Limit conversation history to prevent slowdowns
Streaming: For real-time responses, enable streaming:

def chat_stream(self, message):
    """Stream response for real-time output"""
    payload = {
        "model": self.model_name,
        "prompt": message,
        "stream": True
    }

    response = requests.post(f"{self.base_url}/api/generate", 
                           json=payload, stream=True)

    full_response = ""
    for line in response.iter_lines():
        if line:
            chunk = json.loads(line)
            if 'response' in chunk:
                print(chunk['response'], end='', flush=True)
                full_response += chunk['response']

    return full_response

Local vs Cloud: When to Choose What

Choose Local AI when:

Privacy is critical (medical, legal, personal data)
You need consistent, predictable costs
Working with sensitive code or proprietary information
Building prototypes or learning AI concepts
Internet connectivity is unreliable

Choose Cloud AI when:

You need cutting-edge model capabilities
Handling complex reasoning or specialized tasks
Building production applications with high uptime requirements
Working with multiple languages or specialized domains
Team collaboration requires shared model access

Wrapping Up

You now have a fully functional local AI chatbot! This setup gives you:

Complete privacy and control
No ongoing costs
A foundation for more complex AI applications
Hands-on experience with language models

The complete code is available as a GitHub Gist (replace with actual link when publishing).

Next steps to explore:

Try different models (Mistral, CodeLlama, etc.)
Add a web interface with Flask or FastAPI
Implement RAG (Retrieval-Augmented Generation) with your own documents
Experiment with fine-tuning for specific tasks

Local AI isn't just about avoiding costs - it's about understanding how these systems work and maintaining control over your data. As models continue improving and hardware becomes more powerful, local AI will only get better.

What will you build with your new local AI assistant?

DEV Community