King coder

Posted on Sep 14

How to Build and Run Open Source AI Models Locally and Integrate Them into Your MERN Stack App

#ai #javascript #tutorial #opensource

AI is everywhere, and now you can run powerful AI models on your own computer for free!

No need to pay for cloud APIs or send your data to others.

In this post, I’ll show you how to:

Run an open-source AI model on your own PC using Ollama
Connect this AI to your MERN app (MongoDB, Express, React, Node.js)
Make a simple chat app that talks to the AI

Why Run AI Locally?

Privacy: Your data stays on your machine
Faster: No internet delays
Free: No cloud costs
Control: You decide what the AI does

What You Need

Linux or Mac (Windows users can use WSL)
Terminal / command line
Node.js and npm installed
MongoDB (local or cloud)
React for frontend
Ollama (easy tool to run AI models locally)

Step 1: Install Ollama

Open your terminal and run this:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up to run automatically on your computer.

Step 2: Check if Ollama is Running

systemctl status ollama

You should see a message that says active (running).

If the output shows active (running), Ollama is up and ready.
If it’s not running, start the service with:

sudo systemctl start ollama

You can also enable it to start automatically on boot:

sudo systemctl enable ollama

Step 3: Download an AI Model

Popular AI Models You Can Download & Run Locally

Model Name	Size (Params or GB)	RAM Needed (GB)	GPU VRAM Needed (GB)	Use Case	Category (Difficulty / Hardware)	License	Framework / Compatibility	Typical Use Case Examples
GPT4All-J	~7B params	6-8	Optional (CPU OK)	Lightweight chatbots; runs smoothly on laptops/CPUs, easy setup, good for demos & experimentation	Easy — Runs on most laptops / PCs, 6-8 GB RAM, CPU or low-end GPU	MIT	PyTorch, CPU/GPU	Chatbots, demos
Alpaca 7B	~7B params	8-12	6-8	Instruction-tuned chat	Easy — Mid-range laptops, desktops	Apache 2.0	PyTorch	Instruction-following chatbots
MiniGPT	Small (few hundred MB)	2-4	Optional	Simple chatbot/demo	Easy — Low-end laptops, any PC	MIT	PyTorch	Chatbots, demo apps
TinyBERT	~100MB	1-2	None	Text classification	Easy — Very lightweight NLP	Apache 2.0	TensorFlow, PyTorch	Text classification, NLP
Mistral 7B	7B params	8-12	6-8	General LLM	Medium — Mid to high-end laptops/servers	Apache 2.0	PyTorch	General-purpose language model
LLaMA 2 7B	7B params	8-12	6-8	Chatbots, text gen	Medium — Mid to high-end desktops	Meta License	PyTorch	Chatbots, text generation
Falcon 7B	7B params	8-12	6-8	General NLP	Medium — Mid to high-end desktops	Apache 2.0	PyTorch	NLP tasks, chatbots
GPT4All 13B	13B params	12-16	8-12	Chatbots, demos	Medium — Powerful desktops / small servers	MIT	PyTorch	Chatbots, demos
Stable Diffusion	4-6 GB (model size)	6-8	6-8	Image generation	Medium — GPU-enabled desktops	Creative ML Open RAIL-M	PyTorch	Image generation
Whisper	~1GB	4-6	Optional	Speech-to-text	Medium — Moderate machines	MIT	PyTorch	Speech-to-text transcription
LLaMA 2 13B	13B params	16-24	12-16	General chat, text gen; powerful, open-source, well-supported, great accuracy and speed balance	Medium — 16-24 GB RAM, GPU 12-16 GB VRAM	Meta License	PyTorch	Chatbots, text generation
Falcon 40B	40B params	32+	20+	High-quality NLP	Hard — Powerful GPU servers	Apache 2.0	PyTorch	High-quality NLP, large-scale tasks
LLaMA 2 70B	70B params	40+	24+	Large-scale production; state-of-the-art accuracy, best for deep reasoning & complex NLP tasks	Hard — 40+ GB RAM, high-end multi-GPU	Meta License	PyTorch	Complex NLP, deep reasoning
BLOOM 176B	176B params	80-100+	40+	Multilingual LLM	Hard — Supercomputers, multi-GPU setups	RAIL License	PyTorch	Multilingual generation

ollama pull <Model Name>

This downloads the model.

Step 4: Try Chatting with the AI

Run this to start chatting:

ollama run <Model Name>

Type a question like:

What is the capital of India?

You’ll get an answer right in the terminal.

Step 5: Make a POST Request to Generate Text with Ollama API

You can interact with your local Ollama model by sending a POST request to:

http://localhost:11434/api/generate

Request Headers:


{
  "Content-Type": "application/json"
}

Request Body Parameters

Parameter	Type	Required	Description
`model`	string	✅ Yes	Name of the model to use (e.g. `"llama2"`, `"mistral"`, `"gpt4all"`, etc.)
`prompt`	string	✅ Yes	Input message or question for the AI to respond to
`system`	string	❌ No	System-level prompt to define behavior (e.g. `"You are a helpful assistant."`)
`context`	array of strings	❌ No	Previous messages or tokens to maintain conversation memory
`template`	string	❌ No	Custom template to structure prompt (advanced use)
`stream`	boolean	❌ No	If `true`, response is streamed token by token (default: `true`)
`raw`	boolean	❌ No	If `true`, disables formatting/template logic (default: `false`)
`images`	array of base64 strings	❌ No	For multimodal models (if supported), allows sending image inputs
`temperature`	float	❌ No	Controls randomness: `0` = deterministic, `1` = more creative (default: `0.8`)
`top_p`	float	❌ No	Controls diversity via nucleus sampling (default: `0.9`)
`top_k`	int	❌ No	Limits number of top tokens to sample from
`repeat_penalty`	float	❌ No	Penalty for repeating tokens (e.g. `1.1`)
`presence_penalty`	float	❌ No	Penalizes new tokens based on presence in text so far
`frequency_penalty`	float	❌ No	Penalizes token frequency (stronger for frequent repeats)
`seed`	int	❌ No	Random seed for reproducible results
`stop`	array of strings	❌ No	Sequences that stop the generation (e.g. `["\n", "User:"]`)
`num_predict`	int	❌ No	Max number of tokens to generate (alias: `max_tokens`)

Example POST Request (JSON body):


{
  "model": "llama2-7b",
  "prompt": "Explain how photosynthesis works.",
  "systemPrompt": "You are an expert biology teacher.",
  "maxTokens": 150,
  "temperature": 0.6,
  "stopSequences": ["\n"]
}

How to Make the Request (Example with curl):

curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
        "model": "llama2-7b",
        "prompt": "Explain how photosynthesis works.",
        "systemPrompt": "You are an expert biology teacher.",
        "maxTokens": 150,
        "temperature": 0.6,
        "stopSequences": ["\n"]
      }'

DEV Community