Local LLMs, No API Keys, No BS: Build Your Own Waifubot Terminal Chat in Python

#python #llm #tutorial

Build a Local Waifubot Terminal Chat in Python — No API Keys, No Cloud, No Bullshit

Tired of cloud dependencies, subscriptions, and rate limits? Want your own affectionate AI companion running locally, offline, and async? This walkthrough shows you how to build a waifubot terminal chat using Ollama, LLaMA 3, and Python. No fluff. Just code.

Step 1: Install Ollama (One-Time Setup)

Download Ollama

Ollama lets you run LLMs locally with ease.

Go to oLLaMa’s download page

Download the installer for your OS (Windows/macOS)

Install and open the Ollama app

In the Ollama terminal, pull a model:

ollama pull llama3

This downloads the LLaMA 3 model locally.

🧰 Step 2: Create Your PyCharm Project

Open PyCharm → New Project → name it waifu_terminal_chat

Inside the project, create a file: chat.py

Create a requirements.txt file and add:

requests

PyCharm will prompt you to install it — accept and let it install.

Step 3: Write Your Chat Script

Paste this into chat.py:

import requests
import json
import threading
import time

## Initialize conversation history
conversation_history = []

## Global variables for async operation
is_working = False
current_reply = ""

def talk_to_waifu(prompt, history):
    global is_working, current_reply
    # Build the full prompt with conversation history
    full_prompt = "This is a conversation with Potatoe, a loving waifubot:\n\n"
    # Add previous conversation history
    for message in history[-6:]:  # Keep last 6 messages for context
        full_prompt += f"{message}\n"
    # Add current prompt
    full_prompt += f"Human: {prompt}\nPotatoe:"
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": "llama3", "prompt": full_prompt},
        stream=True
    )
    full_reply = ""
    for line in response.iter_lines():
        if line:
            try:
                chunk = line.decode("utf-8")
                data = json.loads(chunk)
                full_reply += data.get("response", "")
            except Exception as e:
                print("Error decoding chunk:", e)
    current_reply = (prompt, full_reply)  # Store both input and reply
    is_working = False
    return full_reply

def start_waifu_conversation(prompt):
    """Start the waifu conversation in a daemon thread"""
    global is_working
    is_working = True
    thread = threading.Thread(
        target=talk_to_waifu,
        args=(user_input, conversation_history),
        daemon=True
    )
    thread.start()

print("Waifu: Hello darling~ Ready to chat? Type 'exit' to leave")

## Initial system prompt to set up the character
initial_prompt = "Your name is Potatoe. You're affectionate, playful, and always supportive."
conversation_history.append(f"System: {initial_prompt}")

while True:
    if is_working:
        print("Waifu: Thinking... ")
        time.sleep(0.5)
        continue
    if current_reply:
        user_input, reply = current_reply
        print(f"Waifu: {reply}")
        # Add both user input and bot response to history
        conversation_history.append(f"Human: {user_input}")
        conversation_history.append(f"Potatoe: {reply}")
        # Optional: Limit history size to prevent it from growing too large
        if len(conversation_history) > 20:  # Keep last 20 messages
            conversation_history = conversation_history[-20:]
        current_reply = ""
        continue
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        print("Waifu: Bye bye~ I'll miss you! ")
        break
    # Clean wrapper function call
    start_waifu_conversation(user_input)

Notes

This code requires a certain threshold of computing power, so don't expect it to run smoothly on your vintage Pentium 3 machine.

The code is modular and wrapped into functions.

The code runs asyncly, which is handled in the function doing the calls.

The code runs locally and offline: