AI is everywhere, and now you can run powerful AI models on your own computer for free!
No need to pay for cloud APIs or send your data to others.
In this post, I’ll show you how to:
- Run an open-source AI model on your own PC using Ollama
- Connect this AI to your MERN app (MongoDB, Express, React, Node.js)
- Make a simple chat app that talks to the AI
Why Run AI Locally?
- Privacy: Your data stays on your machine
- Faster: No internet delays
- Free: No cloud costs
- Control: You decide what the AI does
What You Need
- Linux or Mac (Windows users can use WSL)
- Terminal / command line
- Node.js and npm installed
- MongoDB (local or cloud)
- React for frontend
- Ollama (easy tool to run AI models locally)
Step 1: Install Ollama
Open your terminal and run this:
curl -fsSL https://ollama.com/install.sh | sh
This installs Ollama and sets it up to run automatically on your computer.
Step 2: Check if Ollama is Running
systemctl status ollama
You should see a message that says active (running).
If the output shows active (running), Ollama is up and ready.
If it’s not running, start the service with:
sudo systemctl start ollama
You can also enable it to start automatically on boot:
sudo systemctl enable ollama
Step 3: Download an AI Model
Popular AI Models You Can Download & Run Locally
Model Name | Size (Params or GB) | RAM Needed (GB) | GPU VRAM Needed (GB) | Use Case | Category (Difficulty / Hardware) | License | Framework / Compatibility | Typical Use Case Examples |
---|---|---|---|---|---|---|---|---|
GPT4All-J | ~7B params | 6-8 | Optional (CPU OK) | Lightweight chatbots; runs smoothly on laptops/CPUs, easy setup, good for demos & experimentation | Easy — Runs on most laptops / PCs, 6-8 GB RAM, CPU or low-end GPU | MIT | PyTorch, CPU/GPU | Chatbots, demos |
Alpaca 7B | ~7B params | 8-12 | 6-8 | Instruction-tuned chat | Easy — Mid-range laptops, desktops | Apache 2.0 | PyTorch | Instruction-following chatbots |
MiniGPT | Small (few hundred MB) | 2-4 | Optional | Simple chatbot/demo | Easy — Low-end laptops, any PC | MIT | PyTorch | Chatbots, demo apps |
TinyBERT | ~100MB | 1-2 | None | Text classification | Easy — Very lightweight NLP | Apache 2.0 | TensorFlow, PyTorch | Text classification, NLP |
Mistral 7B | 7B params | 8-12 | 6-8 | General LLM | Medium — Mid to high-end laptops/servers | Apache 2.0 | PyTorch | General-purpose language model |
LLaMA 2 7B | 7B params | 8-12 | 6-8 | Chatbots, text gen | Medium — Mid to high-end desktops | Meta License | PyTorch | Chatbots, text generation |
Falcon 7B | 7B params | 8-12 | 6-8 | General NLP | Medium — Mid to high-end desktops | Apache 2.0 | PyTorch | NLP tasks, chatbots |
GPT4All 13B | 13B params | 12-16 | 8-12 | Chatbots, demos | Medium — Powerful desktops / small servers | MIT | PyTorch | Chatbots, demos |
Stable Diffusion | 4-6 GB (model size) | 6-8 | 6-8 | Image generation | Medium — GPU-enabled desktops | Creative ML Open RAIL-M | PyTorch | Image generation |
Whisper | ~1GB | 4-6 | Optional | Speech-to-text | Medium — Moderate machines | MIT | PyTorch | Speech-to-text transcription |
LLaMA 2 13B | 13B params | 16-24 | 12-16 | General chat, text gen; powerful, open-source, well-supported, great accuracy and speed balance | Medium — 16-24 GB RAM, GPU 12-16 GB VRAM | Meta License | PyTorch | Chatbots, text generation |
Falcon 40B | 40B params | 32+ | 20+ | High-quality NLP | Hard — Powerful GPU servers | Apache 2.0 | PyTorch | High-quality NLP, large-scale tasks |
LLaMA 2 70B | 70B params | 40+ | 24+ | Large-scale production; state-of-the-art accuracy, best for deep reasoning & complex NLP tasks | Hard — 40+ GB RAM, high-end multi-GPU | Meta License | PyTorch | Complex NLP, deep reasoning |
BLOOM 176B | 176B params | 80-100+ | 40+ | Multilingual LLM | Hard — Supercomputers, multi-GPU setups | RAIL License | PyTorch | Multilingual generation |
ollama pull <Model Name>
This downloads the model.
Step 4: Try Chatting with the AI
Run this to start chatting:
ollama run <Model Name>
Type a question like:
What is the capital of India?
You’ll get an answer right in the terminal.
Step 5: Make a POST Request to Generate Text with Ollama API
You can interact with your local Ollama model by sending a POST request to:
http://localhost:11434/api/generate
Request Headers:
{
"Content-Type": "application/json"
}
Request Body Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model |
string | ✅ Yes | Name of the model to use (e.g. "llama2" , "mistral" , "gpt4all" , etc.) |
prompt |
string | ✅ Yes | Input message or question for the AI to respond to |
system |
string | ❌ No | System-level prompt to define behavior (e.g. "You are a helpful assistant." ) |
context |
array of strings | ❌ No | Previous messages or tokens to maintain conversation memory |
template |
string | ❌ No | Custom template to structure prompt (advanced use) |
stream |
boolean | ❌ No | If true , response is streamed token by token (default: true ) |
raw |
boolean | ❌ No | If true , disables formatting/template logic (default: false ) |
images |
array of base64 strings | ❌ No | For multimodal models (if supported), allows sending image inputs |
temperature |
float | ❌ No | Controls randomness: 0 = deterministic, 1 = more creative (default: 0.8 ) |
top_p |
float | ❌ No | Controls diversity via nucleus sampling (default: 0.9 ) |
top_k |
int | ❌ No | Limits number of top tokens to sample from |
repeat_penalty |
float | ❌ No | Penalty for repeating tokens (e.g. 1.1 ) |
presence_penalty |
float | ❌ No | Penalizes new tokens based on presence in text so far |
frequency_penalty |
float | ❌ No | Penalizes token frequency (stronger for frequent repeats) |
seed |
int | ❌ No | Random seed for reproducible results |
stop |
array of strings | ❌ No | Sequences that stop the generation (e.g. ["\n", "User:"] ) |
num_predict |
int | ❌ No | Max number of tokens to generate (alias: max_tokens ) |
Example POST Request (JSON body):
{
"model": "llama2-7b",
"prompt": "Explain how photosynthesis works.",
"systemPrompt": "You are an expert biology teacher.",
"maxTokens": 150,
"temperature": 0.6,
"stopSequences": ["\n"]
}
How to Make the Request (Example with curl):
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "llama2-7b",
"prompt": "Explain how photosynthesis works.",
"systemPrompt": "You are an expert biology teacher.",
"maxTokens": 150,
"temperature": 0.6,
"stopSequences": ["\n"]
}'
Top comments (0)