Log Parsing with AI at Bronto

#logging #ai #devops #observability

Authored by Gary Nicholls

This post follows on from our AWS Nova log benchmarking article, where we explored how smaller LLMs perform on log analysis tasks. That earlier post highlighted that LLMs are surprisingly good at parsing logs. While that work focused on understanding logs, this post tackles an earlier step: automatically structuring logs using AI.

Logging Origins

Logs are one of the oldest — and still most valuable — forms of observability. Mainframes and early Unix systems were already using logs to record system activity, with tools like syslog dating back to the early 1980s.

Even as systems have become more distributed and complex, logs remain foundational, especially for investigating issues when things go wrong. Logs are typically written to local files before being shipped to modern observability platforms using agents like OpenTelemetry or Fluent Bit.

Why So Many Formats?

The OpenTelemetry (OTel) project is encouraging the adoption of structured JSON logs — and that's a good thing. Structured logs are easier to search, more human-readable, safer to manipulate, and more cloud-native.

But the reality isn't that simple. Many systems still generate unstructured or semi-structured logs where key=value pairs are embedded inside free-text messages. And even among structured formats, things vary wildly — timestamps alone appear in dozens of different formats.

Logs reflect the unique fingerprint of each tech stack:

syslog — still widely used, with quirks in its timestamp formatting
Apache — uses the Common Log Format
nginx — has its own custom variant
Java apps — use logback, log4j, or slf4j
AWS services — often emit structured JSON or a hybrid of text and JSON

With no single standard, Bronto set out to solve the problem in an innovative way — using AI to generate parsers automatically, reducing the toil and complexity that users typically face.

Automated Log Parsing

Parsing logs in real time is a performance-critical operation. When ingesting millions of events per second, every millisecond counts. Regex-based parsing can be complex and hard to maintain, requiring expertise in tools like Grok or Dissect — and can become a bottleneck at scale when applied indiscriminately.

At Bronto, we use a multi-layered approach that separates offline detection from online parsing. Online parsing happens in real time as part of the ingestion pipeline; offline detection occurs outside the pipeline with a short delay. This hybrid approach ensures speed without sacrificing flexibility, while reducing user toil.

Layer 1: Curated Java Parsers

We maintain a library of high-performance Java-based parsers, optimized for the most common formats seen at high volumes across multiple customers. These are purpose-built for speed and designed to fail fast if they encounter a log that doesn't match their expected format.

After applying a Java parser, we run additional lightweight processors to normalize key fields:

Timestamp parser — auto-detects and normalizes varied timestamp formats
Log level parser — maps diverse severity keywords into five standard levels
KVP parser — extracts key=value pairs from the message or body, even if only present in some events

Layer 2: Dissect and Grok Fallback

For less common but still important formats, we fall back to Dissect or Grok:

Dissect — fast and great for structured, delimiter-based logs
Grok — more flexible, supports regex-based parsing, but comes at a performance cost

Bronto maintains a large database of both dissect and grok patterns. Due to their runtime cost, we don't attempt to apply every pattern to every event online. Instead:

We sample log events offline
Match them against our full pattern library
If a match is found, we automatically assign a parser hint to the dataset
Future events in that dataset are parsed using the matched pattern
We gather metrics on parse quality per dataset and periodically revalidate hints

Layer 3: AI-Generated Parsing

When we encounter unknown or proprietary formats, other tools might require users to handcraft regexes through a UI. At Bronto, we let AI do the work.

When enabled, we send a sample of the dataset to an internal AI engine that analyzes the log structure and generates a custom dissect pattern. We test the pattern against a wider sample. If it matches a high percentage of events, we present the pattern and sample results to the user — they can tweak field names if desired — and once approved, the parser is saved and applied automatically to all future events in that dataset.

A Worked Example

Suppose your application logs look like this:

After analyzing hundreds of lines, the AI generates a dissect pattern:

The parsed result looks like this:

`app_name`	`timestamp`	`log_level`	`message`
APP01	2024-07-03 12:50:59	WARN	Invalid sessionId: sessionId=expired
APP01	2024-07-03 12:49:41	INFO	Authentication token issued
APP01	2024-07-03 12:48:27	INFO	User logout: userId=14141
APP01	2024-07-03 12:47:10	INFO	New login attempt
APP01	2024-07-03 12:45:37	INFO	Form submitted: formId=contact-us

The KVP parser then further extracts fields like sessionId, userId, and formId from the message value.

Under the Hood: AWS Bedrock

We use AWS Bedrock as a managed service to access LLMs (including Claude). Our infrastructure chooses the most appropriate model for each application and sends structured prompts — for example, instructing the LLM which patterns to avoid and how to handle keys like timestamps. The user doesn't have to worry about models or prompts; they just use the application.

Bedrock also provides important SaaS-grade guarantees:

Built-in safeguards to detect and filter harmful content
Never stores or uses our data to train models
All data remains within the AWS network
Works seamlessly with Lambda and S3 — no platform rearchitecting required

Looking Ahead

At Bronto we believe parsing should be fast, accurate, and hands-free. Today we generate dissect patterns using AI. Soon we'll be generating Grok patterns too — bringing AI to even more complex and less structured formats.

As OTel continues to push for JSON-based structured logging, the hope is that log parsing becomes a less painful problem over time. But until then, automated, adaptive parsing isn't just a convenience — it's a necessity.

Summary

Bronto combines curated Java parsers, flexible Dissect/Grok matching, and AI-powered pattern generation into a unified pipeline for parsing any log format, structured or otherwise.

If your logs are weird or messy — we've got you.

Explore Bronto's AI Features