DEV Community

Cover image for Quick and Dirty Guide to Running a Local LLM and Making API Requests
Marco
Marco

Posted on

3

Quick and Dirty Guide to Running a Local LLM and Making API Requests

Alright, buckle up because we’re diving into a quick and dirty solution for running a local LLM (Language Model) and making API requests — much like what the fancy commercial solutions do. Why? Well, why not? In just about three minutes, you can have a perfectly decent system running locally for most of your tests. And if you ever feel the need to scale up to the cloud again, switching back is practically effortless.

Here’s the documentation we’ll be following, mostly so you can claim you’ve read it:

In particular, we’ll focus on making a request like this:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
     "model": "gpt-4o-mini",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
Enter fullscreen mode Exit fullscreen mode

So far, so good, right? Nothing groundbreaking. But here’s where it gets fun…

Enter LM Studio

There's this gem of a tool called LM Studio, which makes local LLMs much easier to handle. After installing and running your model, you’ll notice a tab with a console icon called Developer. I know, it doesn’t sound too exciting at first, but hold on, because it gets better. This tab comes with a handy CURL example that shows you exactly how to use your local model. And, wouldn't you know it, it looks pretty familiar!

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b-lexi-uncensored-v2",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes. Today is Thursday" },
      { "role": "user", "content": "What day is it today?" }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

Looks pretty familiar, right? This is the local version of what we just saw. You get the same setup as the OpenAI API request, except it’s running on your local machine. Plus, it's got a little flair — like the "Always answer in rhymes" system prompt. Poetry, anyone?

What About Python? We Got You.

If you prefer working with Python (and let’s be real, who doesn’t?), here’s how you’d send the same request using Python’s requests module:

import requests
import json

url = "http://localhost:1234/v1/chat/completions"

headers = {
    "Content-Type": "application/json"
}

data = {
    "model": "llama-3.1-8b-lexi-uncensored-v2",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": False
}

response = requests.post(url, headers=headers, data=json.dumps(data))

if response.status_code == 200:
    result = response.json()
    print(result["choices"][0]["message"]["content"])
else:
    print(f"Error: {response.status_code}")
Enter fullscreen mode Exit fullscreen mode

And voilà! You’re now ready to send requests to your local LLM just like you would with a commercial API. Go ahead, test it, break it, make it rhyme — the world (or at least your model) is your oyster.

Enjoy!

API Trace View

Struggling with slow API calls?

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post

Best practices for optimal infrastructure performance with Magento

Running a Magento store? Struggling with performance bottlenecks? Join us and get actionable insights and real-world strategies to keep your store fast and reliable.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️