Yemi Adejumobi

Posted on Aug 14, 2024 • Edited on Aug 30, 2024

Run & Debug your LLM Apps locally using Ollama & Llama 3.1

#ai #ollama #llama31 #llm

In the rapidly evolving landscape of AI and ML, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.

Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications locally with ease and efficiency while adding Langtrace, an open-source observability tool that complements Ollama perfectly, providing crucial insights into your LLM application's performance and behavior.

Whether you're a seasoned AI developer or just starting your journey with language models, this guide will equip you with the knowledge and tools to take your LLM projects to the next level. Let’s dive in.

What is Ollama?

Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.

By running LLMs locally, you can:

Reduce cloud costs: Save on cloud computing expenses by running LLMs on your local machine.
Faster experimentation: Quickly test and iterate on your ideas without relying on remote servers.
Improved data privacy: Keep your data local and secure, reducing the risk of data breaches.

Setting up Ollama and running LLMs locally

For this step, we will be using Meta’s latest open source model, Llama3.1. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:

Download and install Ollama https://ollama.com/download
Download the desired LLM model (e.g., Llama3.1 or other open-source models). In a terminal window run the following to run llama3.1 locally for example

ollama run llama3.1

This is similar to docker commands, it will pull and run llama3.1

Once it is done pulling, you should have a terminal prompt you can start chatting from.

For further customization and to use Modelfile to create your own custom system prompt, refer to Ollama documentation here.

Instrumenting Ollama with Langtrace

Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application's performance. To instrument Ollama with Langtrace:

Generate an API key from langtrace.ai - you can also self-host.
Install the Langtrace Python or Typescript SDK.
Import the SDK and initialize the SDK.
Start tracing!

Example code snippet:

from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv

load_dotenv()

# langtrace.init(write_spans_to_console=False)
langtrace.init(api_key = 'YOUR_API_KEY', write_spans_to_console=False)

@with_langtrace_root_span()
def give_recs():
  response = ollama.chat(model='llama3.1', messages=[
    {
      'role': 'user',
      'content': 'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.',
    },
  ])
  print(response['message']['content'])

if __name__ == "__main__":
  print("Running fashionista bot...")
  give_recs()

Here is what the trace looks like in Langtrace UI

Here is a link to a reference cookbook for Ollama integration with Langtrace.

Tracing LLM call

With Langtrace, you can now trace LLM calls and capture essential metadata, such as:

Input, Output and Total tokens
Latency
Error rates

This data provides valuable insights into your application's performance, helping you optimize and improve it over time.

In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior.

Quick Update

I added a UI option to this bot. Feel free to check out the code here. I use Streamlit for the UI but you can swap it out for Gradio or any other library.

To see this in action, install Streamlit

pip install streamlit

Then run the code using

streamlit run ollama-fashionistav2.py

Next steps

In conclusion, combining Ollama's local LLM capabilities with Langtrace's observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy.

With Langtrace, you can gain valuable insights into your application's performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and Langtrace, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and Langtrace today and discover the advantages of local LLM development and open-source observability for yourself!

DEV Community

Run & Debug your LLM Apps locally using Ollama & Llama 3.1

Setting up Ollama and running LLMs locally

Instrumenting Ollama with Langtrace

Tracing LLM call

Quick Update

Next steps

Top comments (0)

Read next

New Training Method Makes AI Decision-Making More Transparent and Logical

Step-by-Step AI Reasoning System Improves Language Model Accuracy by 8.5%

This engineer uses LLMs

AI Language Models Use Hidden Geometry to Add Numbers