DEV Community

Julien Prugne
Julien Prugne

Posted on

Stop paying open AI api for your chat app!

I love tinkering with AI however I'm no AI farmer (those peeps are hot right now).

Piping everything through OpenAI API kinds of creeps me.
Because 'merica is known for spying and using populations private for imperialist fourberies.
And foremost, 'caus I don't wanna pay! oh yeah!

Solution:

  • Install Ollama
  • Chose a lightweight model
  • call the local api
  • make supposition on how to deploy

Install Ollama

You follow the link and do the install: THE Link.

Chose a lightweight model

I chose deepseek-r1:7b

ollama pull deepseek-r1:7b you might have to ctrl+B to exit the prompt.

My old laptop is not starting to catch fire while running completion feel free to use beefier models if you can.

Pull as many as your disk space allow if you want.

Call the API

  • Ollama daemon must be running ollama serve
  • No need to be in the prompt of a model.
  • detailed documentation
  • you can set any model you pulled in the model params

Easy to use generate endpoint:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:7b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

OpenAI API style chat:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:7b",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

And now, you are a good softwarer and go implement in your projectS using the http libs you usually use.

What? you are too lazy and just want to use the OpenAI client?
Well of course you can! documentation

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434",
  apiKey: 'ollama'
});
Enter fullscreen mode Exit fullscreen mode

Go play with that now!

AI #OpenAI #Ollama #BreakFree #deepseek

Sentry blog image

How I fixed 20 seconds of lag for every user in just 20 minutes.

Our AI agent was running 10-20 seconds slower than it should, impacting both our own developers and our early adopters. See how I used Sentry Profiling to fix it in record time.

Read more

Top comments (0)

👋 Kindness is contagious

If you found this post helpful, please leave a ❤️ or a friendly comment below!

Okay