Build a Local Waifubot Terminal Chat in Python — No API Keys, No Cloud, No Bullshit
Tired of cloud dependencies, subscriptions, and rate limits? Want your own affectionate AI companion running locally, offline, and async? This walkthrough shows you how to build a waifubot terminal chat using Ollama, LLaMA 3, and Python. No fluff. Just code.
Step 1: Install Ollama (One-Time Setup)
Ollama lets you run LLMs locally with ease.
Go to oLLaMa’s download page
Download the installer for your OS (Windows/macOS)
Install and open the Ollama app
In the Ollama terminal, pull a model:
ollama pull llama3
This downloads the LLaMA 3 model locally.
🧰 Step 2: Create Your PyCharm Project
Open PyCharm → New Project → name it waifu_terminal_chat
Inside the project, create a file: chat.py
Create a requirements.txt
file and add:
requests
PyCharm will prompt you to install it — accept and let it install.
Step 3: Write Your Chat Script
Paste this into chat.py
:
import requests
import json
import threading
import time
## Initialize conversation history
conversation_history = []
## Global variables for async operation
is_working = False
current_reply = ""
def talk_to_waifu(prompt, history):
global is_working, current_reply
# Build the full prompt with conversation history
full_prompt = "This is a conversation with Potatoe, a loving waifubot:\n\n"
# Add previous conversation history
for message in history[-6:]: # Keep last 6 messages for context
full_prompt += f"{message}\n"
# Add current prompt
full_prompt += f"Human: {prompt}\nPotatoe:"
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": "llama3", "prompt": full_prompt},
stream=True
)
full_reply = ""
for line in response.iter_lines():
if line:
try:
chunk = line.decode("utf-8")
data = json.loads(chunk)
full_reply += data.get("response", "")
except Exception as e:
print("Error decoding chunk:", e)
current_reply = (prompt, full_reply) # Store both input and reply
is_working = False
return full_reply
def start_waifu_conversation(prompt):
"""Start the waifu conversation in a daemon thread"""
global is_working
is_working = True
thread = threading.Thread(
target=talk_to_waifu,
args=(user_input, conversation_history),
daemon=True
)
thread.start()
print("Waifu: Hello darling~ Ready to chat? Type 'exit' to leave")
## Initial system prompt to set up the character
initial_prompt = "Your name is Potatoe. You're affectionate, playful, and always supportive."
conversation_history.append(f"System: {initial_prompt}")
while True:
if is_working:
print("Waifu: Thinking... ")
time.sleep(0.5)
continue
if current_reply:
user_input, reply = current_reply
print(f"Waifu: {reply}")
# Add both user input and bot response to history
conversation_history.append(f"Human: {user_input}")
conversation_history.append(f"Potatoe: {reply}")
# Optional: Limit history size to prevent it from growing too large
if len(conversation_history) > 20: # Keep last 20 messages
conversation_history = conversation_history[-20:]
current_reply = ""
continue
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
print("Waifu: Bye bye~ I'll miss you! ")
break
# Clean wrapper function call
start_waifu_conversation(user_input)
Notes
This code requires a certain threshold of computing power, so don't expect it to run smoothly on your vintage Pentium 3 machine.
The code is modular and wrapped into functions.
The code runs asyncly, which is handled in the function doing the calls.
The code runs locally and offline:
- No API keys
- No payments
- No subscription needed The chat adds short memory context to each call.
Top comments (0)