Patrick Londa for Bronto

Posted on May 19 • Originally published at bronto.io

Logging Your AI Events (from Ollama) in Bronto

#ai #logging #ollama #observability

Authored by David Tracey

Many software companies are investigating the use of Large Language Models (LLMs) in their products. At Bronto we've announced our Bronto Labs initiative, with AI features including auto-parsing, AI dashboard creation, and Bronto Scope for error investigation.

This post explores a different angle: using logs in the development of AI applications. We'll focus on Ollama — an open source tool for running LLMs locally — and show how to pipe its logs into Bronto for search and analysis.

LLMs are complex, non-deterministic systems. Beyond traditional logging use cases (performance monitoring, API usage), their unpredictable nature increases the need for logging — particularly to record and track responses to prompts. Individual log events can be large when they include a full prompt or response. Meta found this problem significant enough at their scale to build a dedicated Meta AI Logging Engine.

The fundamental requirements for logging AI applications are:

Ability to handle large log events
Ability to handle high volumes at low cost
Ability to search across high volumes quickly

These are exactly the requirements Bronto was designed to meet.

Setting Up Ollama

Recommended specs:

16GB RAM (8GB works for smaller models)
12GB disk space for Ollama and basic models
Modern CPU with at least 4 cores (8 preferred)
Optional: GPU for improved performance

Install and Run the Server

Install from ollama.com/download for your OS, then start the server:

ollama serve

You'll see output including the default port it's listening on (11434).

Download and Run a Model

# Pull a model from the registry
ollama pull gemma:2b

# List downloaded models
ollama list

# Run a model interactively
ollama run gemma:2b

The run command gives you a >>> prompt where you can enter prompts or /help for commands.

Sending Ollama Logs to Bronto

Step 1: Configure Ollama Logging to File

Stop the server and restart it writing logs to a file:

ollama serve > /your_log_path/.ollama/logs/server.log 2>&1

For more detailed debug logs, add to your shell profile (.zprofile etc.):

export OLLAMA_LOG_LEVEL=DEBUG
export OLLAMA_DEBUG=true

To redirect model client logs:

# stderr only (keeps console interactive)
ollama run gemma:2b 2>>/your_log_path/.ollama/logs/gemma.log

# both stdout and stderr (API use only — disables console input)
ollama run gemma:2b > /your_log_path/.ollama/logs/gemma.log 2>&1

Verify logs are flowing:

tail -f /your_log_path/.ollama/logs/server.log

Step 2: Install OpenTelemetry Collector

Download for your platform from opentelemetry.io. Example for Mac ARM64:

curl --proto '=https' --tlsv1.2 -fOL \
  https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.114.0/otelcol-contrib_0.114.0_darwin_arm64.tar.gz

chmod +x otelcol-contrib
mv otelcol-contrib /usr/local/bin/otelcol

# Verify
otelcol --version

Step 3: Configure OpenTelemetry to Forward to Bronto

Create /etc/otelcol/config.yaml:

receivers:
  filelog/Ollama_Server:
    include:
      - /your_log_path/.ollama/logs/server.log
    resource:
      service.name: LaptopServer
      service.namespace: Ollama

  filelog/Ollama_Gemma:
    include:
      - /your_log_path/.ollama/logs/gemma.log
    resource:
      service.name: LaptopGemma
      service.namespace: Ollama

processors:
  batch:

exporters:
  otlphttp/brontobytes:
    logs_endpoint: "https://ingestion.us.bronto.io/v1/logs"
    compression: none
    headers:
      x-bronto-api-key: replace_this_with_your_bronto_apikey

service:
  pipelines:
    logs:
      receivers: [filelog/Ollama_Server, filelog/Ollama_Gemma]
      processors: [batch]
      exporters: [otlphttp/brontobytes]
  # Useful for debugging:
  # telemetry:
  #   logs:
  #     level: "debug"
  #     output_paths: [/your_log_path/otelcol/debug.log]

Validate and run:

otelcol validate --config=/etc/otelcol/config.yaml
otelcol --config=/etc/otelcol/config.yaml

A Simple Ollama API Program

The Python script below (ollama-log-demo.py) uses the Ollama API to send prompts against a log file and print the response. Example usage:

# Summarize 100 lines of CDN logs
python3 ollama-log-demo.py 100lines-CDN-log.csv \
  --model "gemma:2b" \
  --prompt "You have been given 100 lines from a CDN log in CSV format. Summarise the logs provided."

# Find errors and suggest fixes
python3 ollama-log-demo.py 100lines-search-log.csv \
  --model "gemma:2b" \
  --prompt "Find errors in this log and suggest how to fix them"

The final line of each Ollama response includes useful performance metadata:

Field	Description
`total_duration`	Total time spent generating the response
`load_duration`	Time spent loading the model (nanoseconds)
`prompt_eval_count`	Number of tokens in the prompt
`prompt_eval_duration`	Time spent evaluating the prompt (nanoseconds)
`eval_count`	Number of tokens in the response
`eval_duration`	Time spent generating the response (nanoseconds)
`context`	Conversation encoding — pass in next request to maintain memory
`response`	Empty if streamed; full response if not streamed

Model notes from testing: gemma:2b is good for summarizing but tends to give high-level summaries even when asked for specifics. mistral takes longer but produces more detailed, data-specific responses. Defining the right prompt for your use case is key.

Searching Ollama Logs in Bronto

Ollama server logs include a mix of structured and unstructured entries:

Standard log levels:

INFO [main] HTTP server listening | hostname="127.0.0.1" port="11434"
level=INFO source=sched.go:714 msg="new model will fit in available VRAM"
level=DEBUG source=memory.go:103 msg=evaluating library=metal gpu_count=1

Model and resource logs:

llm_load_print_meta: max token length = 93
llama_model_loader: - kv 0: general.architecture str = gemma
level=INFO source=server.go:105 msg="system memory" total="8.0 GiB" free="1.2 GiB"

Even a small test with short prompts generates surprisingly large log volumes — 244 events totaling ~2MB in our test. Bronto handles these unstructured and semi-structured formats natively, and you can add a custom parser to make them more convenient to search and view.

Example searches in Bronto:
Fig.1 — Searching for log events containing "tokens"

Fig.2 — Searching for log events containing "prompt"

Fig.3 — Grouping by prompt evaluation time per task_id

Conclusion

This post introduced Ollama as an example of an LLM system and explained why AI applications create unique logging challenges — large events, high volumes, non-deterministic outputs, and distributed agents. We walked through setting up Ollama locally, configuring OpenTelemetry to forward logs to Bronto, and writing a simple Python API program to experiment with prompts against log data.

Future posts will develop the theme further with other AI systems including AWS Bedrock.

Appendix: `ollama-log-demo.py`

import argparse
import json
import requests


def print_ollama_stats(json_response):
    load_duration = json_response.get("load_duration")
    if load_duration:
        print("\n--- load_duration = ", load_duration)

    total_duration = json_response.get("total_duration")
    if total_duration:
        print("\n--- total_duration = ", total_duration)

    eval_duration = json_response.get("eval_duration")
    if eval_duration:
        print("\n--- eval_duration = ", eval_duration)

    prompt_eval_duration = json_response.get("prompt_eval_duration")
    if prompt_eval_duration:
        print("\n--- prompt_eval_duration = ", prompt_eval_duration)

    prompt_eval_count = json_response.get("prompt_eval_count")
    if prompt_eval_count:
        print("\n--- prompt_eval_count = ", prompt_eval_count)

    eval_count = json_response.get("eval_count")
    if eval_count:
        print("\n--- eval_count = ", eval_count)


def examine_log_with_prompt(file_path, input_prompt, input_model):
    with open(file_path, 'r') as file:
        log_data = file.read()

    req_params = {
        "model": input_model,
        "prompt": f"{input_prompt}\n\n{log_data}"
    }

    try:
        # Update localhost URL to match your Ollama API endpoint
        response = requests.post(
            "http://localhost:11434/api/generate",
            headers={"Content-Type": "application/json"},
            data=json.dumps(req_params),
            stream=True
        )
        if response.status_code == 200:
            print("\n--- Processing Successful Ollama Response ---")
            line_count = 0
            for line in response.iter_lines():
                if line:
                    try:
                        json_line = line.decode('utf-8')
                        line_count += 1
                        json_response = json.loads(json_line)
                        print(json_response["response"], end='', flush=True)
                    except json.JSONDecodeError as e:
                        print(f"Error decoding JSON on line {line_count + 1}: {e}")
                    except UnicodeDecodeError as e:
                        print(f"Error decoding line to UTF-8 on line {line_count + 1}: {e}")
            if line_count == 0:
                print("No JSON lines found or response was empty.")
            print("\n--------------------------------------------------")
            print_ollama_stats(json_response)
            print("\n--------------------------------------------------")
        else:
            print(f"\nError - Response Status code: {response.status_code}")
            print(response.text)
    except Exception as e:
        print(e)


def main():
    parser = argparse.ArgumentParser(description="Ollama API Demo for Logs")
    parser.add_argument('file', type=str, help='Path to the log file to be examined')
    parser.add_argument('--model', type=str, help='Model to use in analysis', default=None)
    parser.add_argument('--prompt', type=str, help='Prompt to send to model', default=None)
    args = parser.parse_args()
    examine_log_with_prompt(args.file, args.prompt, args.model)


if __name__ == "__main__":
    main()

Explore Bronto's AI Features

DEV Community

Logging Your AI Events (from Ollama) in Bronto

Setting Up Ollama

Install and Run the Server

Download and Run a Model

Sending Ollama Logs to Bronto

Step 1: Configure Ollama Logging to File

Step 2: Install OpenTelemetry Collector

Step 3: Configure OpenTelemetry to Forward to Bronto

A Simple Ollama API Program

Searching Ollama Logs in Bronto

Conclusion

Appendix: `ollama-log-demo.py`

Top comments (0)

Setting Up Ollama

Install and Run the Server

Download and Run a Model

Sending Ollama Logs to Bronto

Step 1: Configure Ollama Logging to File

Step 2: Install OpenTelemetry Collector

Step 3: Configure OpenTelemetry to Forward to Bronto

A Simple Ollama API Program

Searching Ollama Logs in Bronto

Conclusion

Appendix: ollama-log-demo.py

Appendix: `ollama-log-demo.py`