DEV Community

King coder
King coder

Posted on

How to Build and Run Open Source AI Models Locally and Integrate Them into Your MERN Stack App

AI is everywhere, and now you can run powerful AI models on your own computer for free!

No need to pay for cloud APIs or send your data to others.

In this post, I’ll show you how to:

  • Run an open-source AI model on your own PC using Ollama
  • Connect this AI to your MERN app (MongoDB, Express, React, Node.js)
  • Make a simple chat app that talks to the AI

Why Run AI Locally?

  • Privacy: Your data stays on your machine
  • Faster: No internet delays
  • Free: No cloud costs
  • Control: You decide what the AI does

What You Need

  • Linux or Mac (Windows users can use WSL)
  • Terminal / command line
  • Node.js and npm installed
  • MongoDB (local or cloud)
  • React for frontend
  • Ollama (easy tool to run AI models locally)

Step 1: Install Ollama

Open your terminal and run this:

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

This installs Ollama and sets it up to run automatically on your computer.

Step 2: Check if Ollama is Running

systemctl status ollama
Enter fullscreen mode Exit fullscreen mode

You should see a message that says active (running).

  • If the output shows active (running), Ollama is up and ready.

  • If it’s not running, start the service with:

sudo systemctl start ollama
Enter fullscreen mode Exit fullscreen mode

You can also enable it to start automatically on boot:

sudo systemctl enable ollama

Enter fullscreen mode Exit fullscreen mode

Step 3: Download an AI Model

Popular AI Models You Can Download & Run Locally

Model Name Size (Params or GB) RAM Needed (GB) GPU VRAM Needed (GB) Use Case Category (Difficulty / Hardware) License Framework / Compatibility Typical Use Case Examples
GPT4All-J ~7B params 6-8 Optional (CPU OK) Lightweight chatbots; runs smoothly on laptops/CPUs, easy setup, good for demos & experimentation Easy — Runs on most laptops / PCs, 6-8 GB RAM, CPU or low-end GPU MIT PyTorch, CPU/GPU Chatbots, demos
Alpaca 7B ~7B params 8-12 6-8 Instruction-tuned chat Easy — Mid-range laptops, desktops Apache 2.0 PyTorch Instruction-following chatbots
MiniGPT Small (few hundred MB) 2-4 Optional Simple chatbot/demo Easy — Low-end laptops, any PC MIT PyTorch Chatbots, demo apps
TinyBERT ~100MB 1-2 None Text classification Easy — Very lightweight NLP Apache 2.0 TensorFlow, PyTorch Text classification, NLP
Mistral 7B 7B params 8-12 6-8 General LLM Medium — Mid to high-end laptops/servers Apache 2.0 PyTorch General-purpose language model
LLaMA 2 7B 7B params 8-12 6-8 Chatbots, text gen Medium — Mid to high-end desktops Meta License PyTorch Chatbots, text generation
Falcon 7B 7B params 8-12 6-8 General NLP Medium — Mid to high-end desktops Apache 2.0 PyTorch NLP tasks, chatbots
GPT4All 13B 13B params 12-16 8-12 Chatbots, demos Medium — Powerful desktops / small servers MIT PyTorch Chatbots, demos
Stable Diffusion 4-6 GB (model size) 6-8 6-8 Image generation Medium — GPU-enabled desktops Creative ML Open RAIL-M PyTorch Image generation
Whisper ~1GB 4-6 Optional Speech-to-text Medium — Moderate machines MIT PyTorch Speech-to-text transcription
LLaMA 2 13B 13B params 16-24 12-16 General chat, text gen; powerful, open-source, well-supported, great accuracy and speed balance Medium — 16-24 GB RAM, GPU 12-16 GB VRAM Meta License PyTorch Chatbots, text generation
Falcon 40B 40B params 32+ 20+ High-quality NLP Hard — Powerful GPU servers Apache 2.0 PyTorch High-quality NLP, large-scale tasks
LLaMA 2 70B 70B params 40+ 24+ Large-scale production; state-of-the-art accuracy, best for deep reasoning & complex NLP tasks Hard — 40+ GB RAM, high-end multi-GPU Meta License PyTorch Complex NLP, deep reasoning
BLOOM 176B 176B params 80-100+ 40+ Multilingual LLM Hard — Supercomputers, multi-GPU setups RAIL License PyTorch Multilingual generation
ollama pull <Model Name>
Enter fullscreen mode Exit fullscreen mode

This downloads the model.

Step 4: Try Chatting with the AI

Run this to start chatting:

ollama run <Model Name>
Enter fullscreen mode Exit fullscreen mode

Type a question like:

What is the capital of India?
Enter fullscreen mode Exit fullscreen mode

You’ll get an answer right in the terminal.

Step 5: Make a POST Request to Generate Text with Ollama API

You can interact with your local Ollama model by sending a POST request to:

http://localhost:11434/api/generate
Enter fullscreen mode Exit fullscreen mode

Request Headers:


{
  "Content-Type": "application/json"
}

Enter fullscreen mode Exit fullscreen mode

Request Body Parameters

Parameter Type Required Description
model string ✅ Yes Name of the model to use (e.g. "llama2", "mistral", "gpt4all", etc.)
prompt string ✅ Yes Input message or question for the AI to respond to
system string ❌ No System-level prompt to define behavior (e.g. "You are a helpful assistant.")
context array of strings ❌ No Previous messages or tokens to maintain conversation memory
template string ❌ No Custom template to structure prompt (advanced use)
stream boolean ❌ No If true, response is streamed token by token (default: true)
raw boolean ❌ No If true, disables formatting/template logic (default: false)
images array of base64 strings ❌ No For multimodal models (if supported), allows sending image inputs
temperature float ❌ No Controls randomness: 0 = deterministic, 1 = more creative (default: 0.8)
top_p float ❌ No Controls diversity via nucleus sampling (default: 0.9)
top_k int ❌ No Limits number of top tokens to sample from
repeat_penalty float ❌ No Penalty for repeating tokens (e.g. 1.1)
presence_penalty float ❌ No Penalizes new tokens based on presence in text so far
frequency_penalty float ❌ No Penalizes token frequency (stronger for frequent repeats)
seed int ❌ No Random seed for reproducible results
stop array of strings ❌ No Sequences that stop the generation (e.g. ["\n", "User:"])
num_predict int ❌ No Max number of tokens to generate (alias: max_tokens)

Example POST Request (JSON body):


{
  "model": "llama2-7b",
  "prompt": "Explain how photosynthesis works.",
  "systemPrompt": "You are an expert biology teacher.",
  "maxTokens": 150,
  "temperature": 0.6,
  "stopSequences": ["\n"]
}


Enter fullscreen mode Exit fullscreen mode

How to Make the Request (Example with curl):

curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
        "model": "llama2-7b",
        "prompt": "Explain how photosynthesis works.",
        "systemPrompt": "You are an expert biology teacher.",
        "maxTokens": 150,
        "temperature": 0.6,
        "stopSequences": ["\n"]
      }'


Enter fullscreen mode Exit fullscreen mode

Top comments (0)