Authored by David Tracey
Many software companies are investigating the use of Large Language Models (LLMs) in their products. At Bronto we've announced our Bronto Labs initiative, with AI features including auto-parsing, AI dashboard creation, and Bronto Scope for error investigation.
This post explores a different angle: using logs in the development of AI applications. We'll focus on Ollama — an open source tool for running LLMs locally — and show how to pipe its logs into Bronto for search and analysis.
LLMs are complex, non-deterministic systems. Beyond traditional logging use cases (performance monitoring, API usage), their unpredictable nature increases the need for logging — particularly to record and track responses to prompts. Individual log events can be large when they include a full prompt or response. Meta found this problem significant enough at their scale to build a dedicated Meta AI Logging Engine.
The fundamental requirements for logging AI applications are:
- Ability to handle large log events
- Ability to handle high volumes at low cost
- Ability to search across high volumes quickly
These are exactly the requirements Bronto was designed to meet.
Setting Up Ollama
Recommended specs:
- 16GB RAM (8GB works for smaller models)
- 12GB disk space for Ollama and basic models
- Modern CPU with at least 4 cores (8 preferred)
- Optional: GPU for improved performance
Install and Run the Server
Install from ollama.com/download for your OS, then start the server:
ollama serve
You'll see output including the default port it's listening on (11434).
Download and Run a Model
# Pull a model from the registry
ollama pull gemma:2b
# List downloaded models
ollama list
# Run a model interactively
ollama run gemma:2b
The run command gives you a >>> prompt where you can enter prompts or /help for commands.
Sending Ollama Logs to Bronto
Step 1: Configure Ollama Logging to File
Stop the server and restart it writing logs to a file:
ollama serve > /your_log_path/.ollama/logs/server.log 2>&1
For more detailed debug logs, add to your shell profile (.zprofile etc.):
export OLLAMA_LOG_LEVEL=DEBUG
export OLLAMA_DEBUG=true
To redirect model client logs:
# stderr only (keeps console interactive)
ollama run gemma:2b 2>>/your_log_path/.ollama/logs/gemma.log
# both stdout and stderr (API use only — disables console input)
ollama run gemma:2b > /your_log_path/.ollama/logs/gemma.log 2>&1
Verify logs are flowing:
tail -f /your_log_path/.ollama/logs/server.log
Step 2: Install OpenTelemetry Collector
Download for your platform from opentelemetry.io. Example for Mac ARM64:
curl --proto '=https' --tlsv1.2 -fOL \
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.114.0/otelcol-contrib_0.114.0_darwin_arm64.tar.gz
chmod +x otelcol-contrib
mv otelcol-contrib /usr/local/bin/otelcol
# Verify
otelcol --version
Step 3: Configure OpenTelemetry to Forward to Bronto
Create /etc/otelcol/config.yaml:
receivers:
filelog/Ollama_Server:
include:
- /your_log_path/.ollama/logs/server.log
resource:
service.name: LaptopServer
service.namespace: Ollama
filelog/Ollama_Gemma:
include:
- /your_log_path/.ollama/logs/gemma.log
resource:
service.name: LaptopGemma
service.namespace: Ollama
processors:
batch:
exporters:
otlphttp/brontobytes:
logs_endpoint: "https://ingestion.us.bronto.io/v1/logs"
compression: none
headers:
x-bronto-api-key: replace_this_with_your_bronto_apikey
service:
pipelines:
logs:
receivers: [filelog/Ollama_Server, filelog/Ollama_Gemma]
processors: [batch]
exporters: [otlphttp/brontobytes]
# Useful for debugging:
# telemetry:
# logs:
# level: "debug"
# output_paths: [/your_log_path/otelcol/debug.log]
Validate and run:
otelcol validate --config=/etc/otelcol/config.yaml
otelcol --config=/etc/otelcol/config.yaml
A Simple Ollama API Program
The Python script below (ollama-log-demo.py) uses the Ollama API to send prompts against a log file and print the response. Example usage:
# Summarize 100 lines of CDN logs
python3 ollama-log-demo.py 100lines-CDN-log.csv \
--model "gemma:2b" \
--prompt "You have been given 100 lines from a CDN log in CSV format. Summarise the logs provided."
# Find errors and suggest fixes
python3 ollama-log-demo.py 100lines-search-log.csv \
--model "gemma:2b" \
--prompt "Find errors in this log and suggest how to fix them"
The final line of each Ollama response includes useful performance metadata:
| Field | Description |
|---|---|
total_duration |
Total time spent generating the response |
load_duration |
Time spent loading the model (nanoseconds) |
prompt_eval_count |
Number of tokens in the prompt |
prompt_eval_duration |
Time spent evaluating the prompt (nanoseconds) |
eval_count |
Number of tokens in the response |
eval_duration |
Time spent generating the response (nanoseconds) |
context |
Conversation encoding — pass in next request to maintain memory |
response |
Empty if streamed; full response if not streamed |
Model notes from testing: gemma:2b is good for summarizing but tends to give high-level summaries even when asked for specifics. mistral takes longer but produces more detailed, data-specific responses. Defining the right prompt for your use case is key.
Searching Ollama Logs in Bronto
Ollama server logs include a mix of structured and unstructured entries:
Standard log levels:
INFO [main] HTTP server listening | hostname="127.0.0.1" port="11434"
level=INFO source=sched.go:714 msg="new model will fit in available VRAM"
level=DEBUG source=memory.go:103 msg=evaluating library=metal gpu_count=1
Model and resource logs:
llm_load_print_meta: max token length = 93
llama_model_loader: - kv 0: general.architecture str = gemma
level=INFO source=server.go:105 msg="system memory" total="8.0 GiB" free="1.2 GiB"
Even a small test with short prompts generates surprisingly large log volumes — 244 events totaling ~2MB in our test. Bronto handles these unstructured and semi-structured formats natively, and you can add a custom parser to make them more convenient to search and view.
Example searches in Bronto:
Fig.1 — Searching for log events containing "tokens"

Fig.2 — Searching for log events containing "prompt"

Fig.3 — Grouping by prompt evaluation time per task_id
Conclusion
This post introduced Ollama as an example of an LLM system and explained why AI applications create unique logging challenges — large events, high volumes, non-deterministic outputs, and distributed agents. We walked through setting up Ollama locally, configuring OpenTelemetry to forward logs to Bronto, and writing a simple Python API program to experiment with prompts against log data.
Future posts will develop the theme further with other AI systems including AWS Bedrock.
Appendix: ollama-log-demo.py
import argparse
import json
import requests
def print_ollama_stats(json_response):
load_duration = json_response.get("load_duration")
if load_duration:
print("\n--- load_duration = ", load_duration)
total_duration = json_response.get("total_duration")
if total_duration:
print("\n--- total_duration = ", total_duration)
eval_duration = json_response.get("eval_duration")
if eval_duration:
print("\n--- eval_duration = ", eval_duration)
prompt_eval_duration = json_response.get("prompt_eval_duration")
if prompt_eval_duration:
print("\n--- prompt_eval_duration = ", prompt_eval_duration)
prompt_eval_count = json_response.get("prompt_eval_count")
if prompt_eval_count:
print("\n--- prompt_eval_count = ", prompt_eval_count)
eval_count = json_response.get("eval_count")
if eval_count:
print("\n--- eval_count = ", eval_count)
def examine_log_with_prompt(file_path, input_prompt, input_model):
with open(file_path, 'r') as file:
log_data = file.read()
req_params = {
"model": input_model,
"prompt": f"{input_prompt}\n\n{log_data}"
}
try:
# Update localhost URL to match your Ollama API endpoint
response = requests.post(
"http://localhost:11434/api/generate",
headers={"Content-Type": "application/json"},
data=json.dumps(req_params),
stream=True
)
if response.status_code == 200:
print("\n--- Processing Successful Ollama Response ---")
line_count = 0
for line in response.iter_lines():
if line:
try:
json_line = line.decode('utf-8')
line_count += 1
json_response = json.loads(json_line)
print(json_response["response"], end='', flush=True)
except json.JSONDecodeError as e:
print(f"Error decoding JSON on line {line_count + 1}: {e}")
except UnicodeDecodeError as e:
print(f"Error decoding line to UTF-8 on line {line_count + 1}: {e}")
if line_count == 0:
print("No JSON lines found or response was empty.")
print("\n--------------------------------------------------")
print_ollama_stats(json_response)
print("\n--------------------------------------------------")
else:
print(f"\nError - Response Status code: {response.status_code}")
print(response.text)
except Exception as e:
print(e)
def main():
parser = argparse.ArgumentParser(description="Ollama API Demo for Logs")
parser.add_argument('file', type=str, help='Path to the log file to be examined')
parser.add_argument('--model', type=str, help='Model to use in analysis', default=None)
parser.add_argument('--prompt', type=str, help='Prompt to send to model', default=None)
args = parser.parse_args()
examine_log_with_prompt(args.file, args.prompt, args.model)
if __name__ == "__main__":
main()

Top comments (0)