DEV Community

Cover image for Logging Your AI Events (from Ollama) in Bronto
Patrick Londa for Bronto

Posted on • Originally published at bronto.io

Logging Your AI Events (from Ollama) in Bronto

Authored by David Tracey

Many software companies are investigating the use of Large Language Models (LLMs) in their products. At Bronto we've announced our Bronto Labs initiative, with AI features including auto-parsing, AI dashboard creation, and Bronto Scope for error investigation.

This post explores a different angle: using logs in the development of AI applications. We'll focus on Ollama — an open source tool for running LLMs locally — and show how to pipe its logs into Bronto for search and analysis.

LLMs are complex, non-deterministic systems. Beyond traditional logging use cases (performance monitoring, API usage), their unpredictable nature increases the need for logging — particularly to record and track responses to prompts. Individual log events can be large when they include a full prompt or response. Meta found this problem significant enough at their scale to build a dedicated Meta AI Logging Engine.

The fundamental requirements for logging AI applications are:

  • Ability to handle large log events
  • Ability to handle high volumes at low cost
  • Ability to search across high volumes quickly

These are exactly the requirements Bronto was designed to meet.


Setting Up Ollama

Recommended specs:

  • 16GB RAM (8GB works for smaller models)
  • 12GB disk space for Ollama and basic models
  • Modern CPU with at least 4 cores (8 preferred)
  • Optional: GPU for improved performance

Install and Run the Server

Install from ollama.com/download for your OS, then start the server:

ollama serve
Enter fullscreen mode Exit fullscreen mode

You'll see output including the default port it's listening on (11434).

Download and Run a Model

# Pull a model from the registry
ollama pull gemma:2b

# List downloaded models
ollama list

# Run a model interactively
ollama run gemma:2b
Enter fullscreen mode Exit fullscreen mode

The run command gives you a >>> prompt where you can enter prompts or /help for commands.


Sending Ollama Logs to Bronto

Step 1: Configure Ollama Logging to File

Stop the server and restart it writing logs to a file:

ollama serve > /your_log_path/.ollama/logs/server.log 2>&1
Enter fullscreen mode Exit fullscreen mode

For more detailed debug logs, add to your shell profile (.zprofile etc.):

export OLLAMA_LOG_LEVEL=DEBUG
export OLLAMA_DEBUG=true
Enter fullscreen mode Exit fullscreen mode

To redirect model client logs:

# stderr only (keeps console interactive)
ollama run gemma:2b 2>>/your_log_path/.ollama/logs/gemma.log

# both stdout and stderr (API use only — disables console input)
ollama run gemma:2b > /your_log_path/.ollama/logs/gemma.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Verify logs are flowing:

tail -f /your_log_path/.ollama/logs/server.log
Enter fullscreen mode Exit fullscreen mode

Step 2: Install OpenTelemetry Collector

Download for your platform from opentelemetry.io. Example for Mac ARM64:

curl --proto '=https' --tlsv1.2 -fOL \
  https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.114.0/otelcol-contrib_0.114.0_darwin_arm64.tar.gz

chmod +x otelcol-contrib
mv otelcol-contrib /usr/local/bin/otelcol

# Verify
otelcol --version
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure OpenTelemetry to Forward to Bronto

Create /etc/otelcol/config.yaml:

receivers:
  filelog/Ollama_Server:
    include:
      - /your_log_path/.ollama/logs/server.log
    resource:
      service.name: LaptopServer
      service.namespace: Ollama

  filelog/Ollama_Gemma:
    include:
      - /your_log_path/.ollama/logs/gemma.log
    resource:
      service.name: LaptopGemma
      service.namespace: Ollama

processors:
  batch:

exporters:
  otlphttp/brontobytes:
    logs_endpoint: "https://ingestion.us.bronto.io/v1/logs"
    compression: none
    headers:
      x-bronto-api-key: replace_this_with_your_bronto_apikey

service:
  pipelines:
    logs:
      receivers: [filelog/Ollama_Server, filelog/Ollama_Gemma]
      processors: [batch]
      exporters: [otlphttp/brontobytes]
  # Useful for debugging:
  # telemetry:
  #   logs:
  #     level: "debug"
  #     output_paths: [/your_log_path/otelcol/debug.log]
Enter fullscreen mode Exit fullscreen mode

Validate and run:

otelcol validate --config=/etc/otelcol/config.yaml
otelcol --config=/etc/otelcol/config.yaml
Enter fullscreen mode Exit fullscreen mode

A Simple Ollama API Program

The Python script below (ollama-log-demo.py) uses the Ollama API to send prompts against a log file and print the response. Example usage:

# Summarize 100 lines of CDN logs
python3 ollama-log-demo.py 100lines-CDN-log.csv \
  --model "gemma:2b" \
  --prompt "You have been given 100 lines from a CDN log in CSV format. Summarise the logs provided."

# Find errors and suggest fixes
python3 ollama-log-demo.py 100lines-search-log.csv \
  --model "gemma:2b" \
  --prompt "Find errors in this log and suggest how to fix them"
Enter fullscreen mode Exit fullscreen mode

The final line of each Ollama response includes useful performance metadata:

Field Description
total_duration Total time spent generating the response
load_duration Time spent loading the model (nanoseconds)
prompt_eval_count Number of tokens in the prompt
prompt_eval_duration Time spent evaluating the prompt (nanoseconds)
eval_count Number of tokens in the response
eval_duration Time spent generating the response (nanoseconds)
context Conversation encoding — pass in next request to maintain memory
response Empty if streamed; full response if not streamed

Model notes from testing: gemma:2b is good for summarizing but tends to give high-level summaries even when asked for specifics. mistral takes longer but produces more detailed, data-specific responses. Defining the right prompt for your use case is key.


Searching Ollama Logs in Bronto

Ollama server logs include a mix of structured and unstructured entries:

Standard log levels:

INFO [main] HTTP server listening | hostname="127.0.0.1" port="11434"
level=INFO source=sched.go:714 msg="new model will fit in available VRAM"
level=DEBUG source=memory.go:103 msg=evaluating library=metal gpu_count=1
Enter fullscreen mode Exit fullscreen mode

Model and resource logs:

llm_load_print_meta: max token length = 93
llama_model_loader: - kv 0: general.architecture str = gemma
level=INFO source=server.go:105 msg="system memory" total="8.0 GiB" free="1.2 GiB"
Enter fullscreen mode Exit fullscreen mode

Even a small test with short prompts generates surprisingly large log volumes — 244 events totaling ~2MB in our test. Bronto handles these unstructured and semi-structured formats natively, and you can add a custom parser to make them more convenient to search and view.

Example searches in Bronto:
Fig.1 — Searching for log events containing "tokens"
Searching for log events containing

Fig.2 — Searching for log events containing "prompt"
Searching for log events containing
Fig.3 — Grouping by prompt evaluation time per task_id

Grouping by prompt evaluation time per task_id


Conclusion

This post introduced Ollama as an example of an LLM system and explained why AI applications create unique logging challenges — large events, high volumes, non-deterministic outputs, and distributed agents. We walked through setting up Ollama locally, configuring OpenTelemetry to forward logs to Bronto, and writing a simple Python API program to experiment with prompts against log data.

Future posts will develop the theme further with other AI systems including AWS Bedrock.


Appendix: ollama-log-demo.py

import argparse
import json
import requests


def print_ollama_stats(json_response):
    load_duration = json_response.get("load_duration")
    if load_duration:
        print("\n--- load_duration = ", load_duration)

    total_duration = json_response.get("total_duration")
    if total_duration:
        print("\n--- total_duration = ", total_duration)

    eval_duration = json_response.get("eval_duration")
    if eval_duration:
        print("\n--- eval_duration = ", eval_duration)

    prompt_eval_duration = json_response.get("prompt_eval_duration")
    if prompt_eval_duration:
        print("\n--- prompt_eval_duration = ", prompt_eval_duration)

    prompt_eval_count = json_response.get("prompt_eval_count")
    if prompt_eval_count:
        print("\n--- prompt_eval_count = ", prompt_eval_count)

    eval_count = json_response.get("eval_count")
    if eval_count:
        print("\n--- eval_count = ", eval_count)


def examine_log_with_prompt(file_path, input_prompt, input_model):
    with open(file_path, 'r') as file:
        log_data = file.read()

    req_params = {
        "model": input_model,
        "prompt": f"{input_prompt}\n\n{log_data}"
    }

    try:
        # Update localhost URL to match your Ollama API endpoint
        response = requests.post(
            "http://localhost:11434/api/generate",
            headers={"Content-Type": "application/json"},
            data=json.dumps(req_params),
            stream=True
        )
        if response.status_code == 200:
            print("\n--- Processing Successful Ollama Response ---")
            line_count = 0
            for line in response.iter_lines():
                if line:
                    try:
                        json_line = line.decode('utf-8')
                        line_count += 1
                        json_response = json.loads(json_line)
                        print(json_response["response"], end='', flush=True)
                    except json.JSONDecodeError as e:
                        print(f"Error decoding JSON on line {line_count + 1}: {e}")
                    except UnicodeDecodeError as e:
                        print(f"Error decoding line to UTF-8 on line {line_count + 1}: {e}")
            if line_count == 0:
                print("No JSON lines found or response was empty.")
            print("\n--------------------------------------------------")
            print_ollama_stats(json_response)
            print("\n--------------------------------------------------")
        else:
            print(f"\nError - Response Status code: {response.status_code}")
            print(response.text)
    except Exception as e:
        print(e)


def main():
    parser = argparse.ArgumentParser(description="Ollama API Demo for Logs")
    parser.add_argument('file', type=str, help='Path to the log file to be examined')
    parser.add_argument('--model', type=str, help='Model to use in analysis', default=None)
    parser.add_argument('--prompt', type=str, help='Prompt to send to model', default=None)
    args = parser.parse_args()
    examine_log_with_prompt(args.file, args.prompt, args.model)


if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Explore Bronto's AI Features

Top comments (0)