<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patrick Londa</title>
    <description>The latest articles on DEV Community by Patrick Londa (@patrick_londa_1477353d65e).</description>
    <link>https://dev.to/patrick_londa_1477353d65e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3933240%2Fd3977bd7-c0bf-43fb-8f47-e2b395a8d5b9.jpeg</url>
      <title>DEV Community: Patrick Londa</title>
      <link>https://dev.to/patrick_londa_1477353d65e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patrick_londa_1477353d65e"/>
    <language>en</language>
    <item>
      <title>Introduction to A2A and Agent Search</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Tue, 26 May 2026 18:32:08 +0000</pubDate>
      <link>https://dev.to/bronto_io/introduction-to-a2a-and-agent-search-2lj2</link>
      <guid>https://dev.to/bronto_io/introduction-to-a2a-and-agent-search-2lj2</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by David Tracey&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI is rapidly evolving from simple tools to increasingly complex agents capable of reasoning and decision making. As agents are used for more tasks, the ability to use multiple co-operating agents will become increasingly important — particularly for tasks requiring specialised knowledge or access across domains.&lt;/p&gt;

&lt;p&gt;Large-scale, flexible solutions will require a common means for agents to communicate capabilities to each other using a trusted means of collaboration. That's what the &lt;strong&gt;Agent2Agent (A2A) protocol&lt;/strong&gt; is designed to provide.&lt;/p&gt;

&lt;p&gt;This is the first post in a three-part series:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Part 1 (this post):&lt;/strong&gt; Introduction to A2A concepts and a "Hello World" agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2:&lt;/strong&gt; Using MCP to query multiple data sources including Bronto's logging platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3:&lt;/strong&gt; Combining A2A and MCP with a "SuperAgent" orchestrating multiple agents&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Why A2A?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://a2a-protocol.org/latest/" rel="noopener noreferrer"&gt;Agent2Agent protocol&lt;/a&gt; — originally developed by Google and now under the &lt;a href="https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents" rel="noopener noreferrer"&gt;Linux Foundation&lt;/a&gt; as an open source project — gives agents an open protocol for interaction built on HTTP. It standardizes how agents exchange messages, requests, and data, allowing agents to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Discover each other's capabilities without exposing internal state, memory, or implementation&lt;/li&gt;
&lt;li&gt;Negotiate interaction details (text, forms, media, etc.)&lt;/li&gt;
&lt;li&gt;Collaborate on running tasks securely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;IBM's definition of agentic AI captures it well: &lt;em&gt;"Agentic AI is an artificial intelligence system that can accomplish a specific goal with limited supervision"&lt;/em&gt;, with each agent in a multi-agent system performing a specific subtask coordinated through AI orchestration.&lt;/p&gt;




&lt;h2&gt;
  
  
  A2A and MCP: Complementary Protocols
&lt;/h2&gt;

&lt;p&gt;MCP and A2A serve different but complementary roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; exposes tools and structured data sources to an agent. It extends an agent's capabilities by standardizing how an agent accesses databases or product APIs. It focuses on a single agent accessing tools, and lacks built-in agent authentication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A2A&lt;/strong&gt; provides structured agent-to-agent communication — allowing multiple autonomous agents to collaborate, delegate tasks, and exchange information using capabilities exposed via Agent Cards. It enables multi-agent interactions with secure communication and capability discovery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP Tools are generally basic operations with defined inputs/outputs. A2A is designed for autonomous agents that reason, use tools, and collaborate on complex problems. The two protocols are not mutually exclusive — agents can use MCP for specific tool access and A2A for broader agent collaboration.&lt;/p&gt;

&lt;p&gt;Part 3 of this series will show an emerging pattern where A2A agents sit in front of MCP servers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key A2A Concepts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent Card&lt;/strong&gt; — public metadata describing an agent's capabilities, skills, URL, and authentication requirements. Other agents retrieve this card to discover what an agent can do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A2A Server&lt;/strong&gt; — exposes an HTTP API endpoint implementing A2A methods and executes tasks on behalf of other agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A2A Client&lt;/strong&gt; — an application or agent that sends requests to an A2A server to initiate tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task&lt;/strong&gt; — initiated by a client sending an A2A message (role: &lt;code&gt;user&lt;/code&gt;). Each task has a unique ID and states: &lt;code&gt;submitted&lt;/code&gt;, &lt;code&gt;working&lt;/code&gt;, &lt;code&gt;input-required&lt;/code&gt;, &lt;code&gt;completed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Message&lt;/strong&gt; — contains a role (&lt;code&gt;user&lt;/code&gt; or &lt;code&gt;agent&lt;/code&gt;), optional metadata, and an array of parts (&lt;code&gt;TextPart&lt;/code&gt;, &lt;code&gt;FilePart&lt;/code&gt;, or structured JSON data).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communication flows:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt; — client fetches the Agent Card via &lt;code&gt;GET /.well-known/agent-card.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initiation&lt;/strong&gt; — client sends a request with a task ID to the server agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processing&lt;/strong&gt; — server processes the task, streaming intermediate updates via Server-Sent Events if needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqnk3yejx0nb66f0uwjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqnk3yejx0nb66f0uwjj.png" alt="A2A protocol diagram" width="636" height="377"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Hello World A2A Agent
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation and Setup
&lt;/h3&gt;

&lt;p&gt;These examples use the official &lt;a href="https://github.com/a2aproject/a2a-samples" rel="noopener noreferrer"&gt;A2A samples repository&lt;/a&gt;. Requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.12 or higher&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;UV&lt;/a&gt; (recommended Python package manager)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Running the Sample
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/a2aproject/a2a-samples.git
&lt;span class="nb"&gt;cd &lt;/span&gt;a2a-samples/samples/python/agents/helloworld

&lt;span class="c"&gt;# Install UV and run the server&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;uv
uv run &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs &lt;code&gt;__main__.py&lt;/code&gt; in the helloworld directory, which defines an &lt;code&gt;AgentCard&lt;/code&gt;, &lt;code&gt;AgentSkill&lt;/code&gt;, and extended card, then calls uvicorn to run a server on &lt;code&gt;localhost:9999&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In a separate terminal, run the client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run test_client.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Agent Card Response
&lt;/h3&gt;

&lt;p&gt;The server returns its public Agent Card:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"capabilities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"streaming"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaultInputModes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaultOutputModes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Just a hello world agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello World Agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"preferredTransport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"JSONRPC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"protocolVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skills"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"just returns hello world"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"examples"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"hi"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hello_world"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Returns hello world"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"supportsAuthenticatedExtendedCard"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:9999/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0.0"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Authenticated clients additionally receive an extended card with a &lt;code&gt;super_hello_world&lt;/code&gt; skill — demonstrating how A2A supports capability tiers based on authentication level.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Coming in Parts 2 and 3
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Part 2&lt;/strong&gt; will show how to build simple MCP servers for Bronto's REST API and a SQLite database, then use Claude to query both simultaneously with a natural language prompt — no data movement required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 3&lt;/strong&gt; will extend this into a full A2A scenario with a "SuperAgent" orchestrating two A2A agents (one per data store), demonstrating how A2A and MCP work together.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture: Challenges A2A Still Needs to Solve
&lt;/h2&gt;

&lt;p&gt;A2A is an evolving protocol. Current areas requiring further development include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent orchestration&lt;/strong&gt; — standardized support for conflict handling and failure recovery across agents from different organizations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared vocabulary&lt;/strong&gt; — no agreed-upon standard definitions for common items like invoices, policies, or receipts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust&lt;/strong&gt; — establishing trust between agents across organizational boundaries is key to adoption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; — authentication is provided, but richer methods are needed for privacy across organizational/national boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise readiness&lt;/strong&gt; — standardized usage management, SLOs, SLAs, and automated negotiation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The separate open source project &lt;a href="https://agntcy.org/" rel="noopener noreferrer"&gt;AGNTCY&lt;/a&gt;, backed by Cisco, LangChain, Galileo and others, provides the Open Agent Schema Framework (OASF) as another approach to standardizing agent capability descriptions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/bronto-labs" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Explore Bronto's AI Features&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>devops</category>
    </item>
    <item>
      <title>Bronto for Fastly: Real-Time CDN Logging That Actually Scales</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Tue, 26 May 2026 17:04:31 +0000</pubDate>
      <link>https://dev.to/bronto_io/bronto-for-fastly-real-time-cdn-logging-that-actually-scales-b1k</link>
      <guid>https://dev.to/bronto_io/bronto-for-fastly-real-time-cdn-logging-that-actually-scales-b1k</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Mike Neville-O'Neill&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A company processing tens of TBs of Fastly logs daily through traditional logging vendors faces an impossible choice: pay enterprise prices for comprehensive logging, or fly blind with minimal retention.&lt;/p&gt;

&lt;p&gt;That's why most teams default to a 7-day retention window. Not because they don't need historical data — because the economics don't work.&lt;/p&gt;

&lt;p&gt;That was the exact situation for &lt;a href="https://www.contentstack.com/" rel="noopener noreferrer"&gt;Contentstack&lt;/a&gt;, a leading global platform for digital experiences. They now keep a full year of logs, run unlimited queries, and report 50% cost savings versus their previous solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost of Restricted Logging
&lt;/h2&gt;

&lt;p&gt;Processing 10TB of Fastly logs daily can cost ~$30,000/month for ingestion alone on traditional platforms. To reduce costs, teams resort to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Abbreviating field names (&lt;code&gt;EdgeResponseStatus&lt;/code&gt; → &lt;code&gt;ers&lt;/code&gt;) to save bytes&lt;/li&gt;
&lt;li&gt;Sampling 1 in 100 requests and hoping issues appear in that sample&lt;/li&gt;
&lt;li&gt;Keeping only 7 days of data when seasonal patterns span months&lt;/li&gt;
&lt;li&gt;Dropping entire log categories to keep storage costs down&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When ingestion and storage are expensive, comprehensive logging becomes a luxury instead of standard practice. But what if the economics actually worked?&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changes When You Can Keep Everything
&lt;/h2&gt;

&lt;p&gt;Bronto's architectural efficiency — 90% compression, optimized storage, purpose-built for logs — means you pay 50–90% less to ingest the same data. This changes everything.&lt;/p&gt;

&lt;p&gt;Consider tracking API performance across enterprise customers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Monitoring performance across multiple API customers&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'customer_key_1'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'customer_key_2'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'customer_key_3'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;resTime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'MISS, MISS'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;
&lt;span class="c1"&gt;-- Time range: 30 days&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When ingestion is affordable, you can actually keep the 30 days of high-volume logs this investigation requires. No sampling, no abbreviations, no compromises.&lt;/p&gt;

&lt;h3&gt;
  
  
  30-Day Performance Archaeology
&lt;/h3&gt;

&lt;p&gt;A customer report of intermittent slowdowns triggered a deep investigation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resTime&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resTime&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'customer_12345'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="c1"&gt;-- Analyzing response times week-over-week for the past month&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Analyzing max and average response times over 30 days revealed certain endpoints were gradually slowing down — a pattern invisible in a 7-day window but clear with month-long data.&lt;/p&gt;

&lt;p&gt;The team spotted week-over-week degradation that would have caused a critical outage within weeks. With 7-day retention, they never would have seen it coming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Month Error Forensics
&lt;/h3&gt;

&lt;p&gt;When 404 errors started appearing randomly across the CDN, the team needed to know: new issue or recurring pattern?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%/api/v2/content/%'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;response_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'404'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pop&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="c1"&gt;-- 90-day investigation of 404 errors for content API endpoints&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering 90 days of complete, unsampled logs revealed what looked like random errors was actually a cache invalidation race condition — one that only occurred under specific geographic and timing conditions. With 7-day windows and aggressive sampling, this would have remained an unsolved mystery.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Intelligence Layer: From Logs to Insights
&lt;/h2&gt;

&lt;p&gt;Long-term retention unlocks intelligence — but only if you can query it. Bronto's analytics engine lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run SQL queries across months of data in seconds&lt;/li&gt;
&lt;li&gt;Set up change detection that automatically flags when cache hit rates drop&lt;/li&gt;
&lt;li&gt;Build filterable dashboards to click into any spike or anomaly&lt;/li&gt;
&lt;li&gt;Use natural language with the AI dashboard builder: &lt;em&gt;"Show me API response times by customer and region for the last 90 days"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Use BrontoScope to interrogate logs with context, correlating events and surfacing root causes that would take hours to piece together manually&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Becomes Possible
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;API customer journey mapping&lt;/strong&gt; — track the entire API consumer lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which endpoints do new users test first?&lt;/li&gt;
&lt;li&gt;How does usage scale over months?&lt;/li&gt;
&lt;li&gt;What usage patterns precede account downgrades?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Intelligent cache optimization&lt;/strong&gt; — with a year of data you can finally answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do cache hit ratios vary by season, not just by day?&lt;/li&gt;
&lt;li&gt;Which content genuinely needs global distribution vs. regional?&lt;/li&gt;
&lt;li&gt;How do different shield configurations perform under various traffic patterns?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security pattern recognition&lt;/strong&gt; — long-term retention enables sophisticated threat detection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bot networks that rotate through IP ranges over weeks&lt;/li&gt;
&lt;li&gt;Credential stuffing campaigns that evolve over months&lt;/li&gt;
&lt;li&gt;DDoS reconnaissance that occurs weeks before an attack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7txc6kwq0re3ooo5yi.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7txc6kwq0re3ooo5yi.gif" alt="API performance dashboard in Bronto" width="720" height="392"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Before and After Bronto
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Field names abbreviated to save bytes&lt;/td&gt;
&lt;td&gt;Full field names: readable and searchable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggressive sampling: keeping 1% of logs&lt;/td&gt;
&lt;td&gt;100% coverage: every request, response, error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Short retention: 7 days max&lt;/td&gt;
&lt;td&gt;365-day retention: full seasonal patterns visible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blind spots: log categories dropped to save costs&lt;/td&gt;
&lt;td&gt;Complete visibility: CDN and compute logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Technical Foundation: Built for Fastly's Scale
&lt;/h2&gt;

&lt;p&gt;Bronto integrates with Fastly through a &lt;a href="https://www.fastly.com/documentation/guides/integrations/logging-endpoints/log-streaming-bronto/" rel="noopener noreferrer"&gt;built-in logging endpoint&lt;/a&gt;. Configure it once in your Fastly service settings and logs start streaming in minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simple Fastly Integration&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bronto&lt;/span&gt;
&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JSON&lt;/span&gt;
&lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://logs.bronto.io/fastly&lt;/span&gt;
&lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;zstd&lt;/span&gt;  &lt;span class="c1"&gt;# 10:1 automatic compression&lt;/span&gt;
&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;POST&lt;/span&gt;
&lt;span class="na"&gt;header_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;X-Bronto-Token&lt;/span&gt;
&lt;span class="na"&gt;header_value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${BRONTO_TOKEN}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it scales:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Elastic ingestion that grows with traffic&lt;/li&gt;
&lt;li&gt;90% data compression&lt;/li&gt;
&lt;li&gt;Time-series optimization for CDN access patterns&lt;/li&gt;
&lt;li&gt;Intelligent tiering keeps recent data hot, historical data accessible&lt;/li&gt;
&lt;li&gt;Column-oriented storage for efficient aggregations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Bronto Express: From Ingestion to Insights in Minutes
&lt;/h2&gt;

&lt;p&gt;Setting up this level of insight shouldn't require weeks of work. &lt;strong&gt;Bronto Express for Fastly&lt;/strong&gt; uses AI to take raw log streams and dynamically build Fastly-specific queries, monitors, and dashboards personalized for your organization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4xpfu8jadu3byixkwxv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4xpfu8jadu3byixkwxv.png" alt="Bronto Express Fastly preview" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It includes pre-built dashboards for real-time performance, geographic traffic, API usage, and security monitoring; a saved searches library for common investigations; and intelligent monitors for performance degradation and error rate thresholds.&lt;/p&gt;

&lt;p&gt;Setup is four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create your Bronto instance (selecting the Fastly Express pack)&lt;/li&gt;
&lt;li&gt;Add Bronto as a logging endpoint in Fastly (5-minute configuration)&lt;/li&gt;
&lt;li&gt;Watch pre-built dashboards populate with your data&lt;/li&gt;
&lt;li&gt;Customize searches and alerts for your specific needs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No agents. No forwarders. No complex pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Platform Effect: When Every Team Has Access
&lt;/h2&gt;

&lt;p&gt;When CDN logs become affordable and accessible, every team benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Engineering&lt;/strong&gt; — debugs issues with complete historical context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; — investigates threats across meaningful timeframes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product&lt;/strong&gt; — understands real usage patterns, not samples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sales&lt;/strong&gt; — identifies expansion opportunities from actual usage data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finance&lt;/strong&gt; — accurately allocates costs to business units&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support&lt;/strong&gt; — resolves tickets with full forensic capability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't just about saving money on logging (though 50–90% savings helps). It's about removing the artificial constraints that prevent teams from fully understanding their infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690a0d4891fc665398f7fccf_Bronto%2520CDN%2520Fastly%2520with%2520Maps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690a0d4891fc665398f7fccf_Bronto%2520CDN%2520Fastly%2520with%2520Maps.png" alt="Geographic visualization of CDN traffic" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See Sub-Second Search Across Your Fastly Logs&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>devops</category>
      <category>logging</category>
      <category>cdn</category>
      <category>observability</category>
    </item>
    <item>
      <title>Investigation Reports: When Monitors Get Smarter</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Fri, 22 May 2026 18:04:01 +0000</pubDate>
      <link>https://dev.to/bronto_io/investigation-reports-when-monitors-get-smarter-51ei</link>
      <guid>https://dev.to/bronto_io/investigation-reports-when-monitors-get-smarter-51ei</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Marco Aquilanti&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When a monitor fires, there's a familiar sequence of checks required to find the root cause. The engineers who set up the monitor usually know these steps by heart — they know the dependencies, the error codes, what to check and where. But for the on-call responder, these steps aren't always obvious. Historically, the solution was to force engineering teams to document the checks in a playbook and hope the responder would read it under pressure.&lt;/p&gt;

&lt;p&gt;Today, we can offload these checks to an LLM, shifting the responder's role from &lt;strong&gt;gathering evidence&lt;/strong&gt; to &lt;strong&gt;reviewing a diagnosis&lt;/strong&gt; — significantly reducing MTTR.&lt;/p&gt;

&lt;p&gt;Our new &lt;strong&gt;Investigation Reports&lt;/strong&gt; feature does exactly this: an LLM completes the investigation and delivers a detailed report before the human even acknowledges the alert.&lt;/p&gt;

&lt;p&gt;Investigation Reports builds on &lt;a href="https://www.bronto.io/blog/error-investigations-with-brontoscope" rel="noopener noreferrer"&gt;BrontoScope&lt;/a&gt;, our first AI-powered investigation feature — and on the positive customer feedback it generated.&lt;/p&gt;




&lt;h2&gt;
  
  
  BrontoScope vs. Investigation Reports
&lt;/h2&gt;

&lt;p&gt;Both BrontoScope and Investigation Reports perform automated investigations and provide reports, but they work differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BrontoScope&lt;/strong&gt; starts with a user request to investigate a specific error event in the logs. The investigation follows a defined workflow aimed at establishing when and where an error is occurring. The LLM guides the process and summarizes findings synchronously — the user is waiting for a response and gets it in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Investigation Reports&lt;/strong&gt; is triggered by a system event (a monitor firing), with no user waiting for a synchronous response. This gives the LLM more time — not seconds but minutes — to query data and analyze results. Investigating an alert is also a more generic task than BrontoScope's focused error investigation, making it harder to define a single fixed workflow that succeeds in every scenario.&lt;/p&gt;

&lt;p&gt;For these reasons, Investigation Reports lets the LLM operate more freely — giving it &lt;strong&gt;tools&lt;/strong&gt; and &lt;strong&gt;context&lt;/strong&gt; rather than a coded workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools Are Easy, Context Is Hard
&lt;/h2&gt;

&lt;p&gt;The tools side is straightforward: the LLM can call Bronto's APIs to perform &lt;a href="https://www.bronto.io/blog/why-is-bronto-so-fast" rel="noopener noreferrer"&gt;lightning-fast log search&lt;/a&gt;, query key-value dictionaries, check monitor history, retrieve precomputed metrics, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt; is the harder problem.&lt;/p&gt;

&lt;p&gt;LLMs make good logical decisions when provided with relevant, well-explained context. But the context window is limited — and it's been demonstrated repeatedly that longer context leads to worse answers and higher hallucination rates. This is known as "context rot" (see &lt;a href="https://research.trychroma.com/context-rot" rel="noopener noreferrer"&gt;research from Chroma&lt;/a&gt; and &lt;a href="https://arxiv.org/pdf/2509.21361" rel="noopener noreferrer"&gt;this arXiv paper&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;For an effective investigation, the LLM needs more than just the monitor that fired. It needs historical context and an understanding of the monitored system. But dumping thousands of tokens of documentation into the prompt backfires — it drastically increases hallucination risk and degrades report accuracy.&lt;/p&gt;




&lt;h2&gt;
  
  
  User-Defined Investigation Context
&lt;/h2&gt;

&lt;p&gt;The precise knowledge needed for a good investigation is hard for an LLM to infer autonomously — but it can be provided by the engineer who owns the monitor.&lt;/p&gt;

&lt;p&gt;In a dedicated "Investigation Prompt" text area, the user instructs the LLM on what to check and what to do when the monitor fires. Free-form text makes the feature highly flexible, effectively letting users define an ad-hoc workflow for each specific use case.&lt;/p&gt;

&lt;p&gt;Engineers and SREs commonly include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;List of dependencies of the affected service&lt;/li&gt;
&lt;li&gt;Related log datasets and how to correlate/query them&lt;/li&gt;
&lt;li&gt;Relevant keys and metrics to check&lt;/li&gt;
&lt;li&gt;What to include in the report — affected components, customers, or users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The screenshot below shows an example investigation prompt telling the LLM to check datasets in a collection named "booking system":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte4por6ohzd2dlbx3x1f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte4por6ohzd2dlbx3x1f.png" alt="Investigation prompt configuration" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here's the Investigation Report generated when that monitor fired — the LLM followed the instructions, ran multiple queries, and produced a report with a potential root cause, diagnosis, and timeline:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysaqs1s1qt78m983721s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysaqs1s1qt78m983721s.png" alt="Generated investigation report" width="583" height="699"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Investigation Reports Beyond Incident Response
&lt;/h2&gt;

&lt;p&gt;Bronto's own customer support and sales teams found an unexpected use case. They set up monitors to be notified when new organizations are created or contracts are updated in the system — keeping the team up to date on new sign-ups and customer onboarding.&lt;/p&gt;

&lt;p&gt;Investigation Reports automates the task of fetching context about each event. Details like contract type, retention plan, company size, and location are queried across multiple logs and assembled into a report that arrives within a minute of the monitor notification. This lets the team quickly identify relevant events among routine ones.&lt;/p&gt;

&lt;p&gt;Below is the investigation prompt used by the customer support team, and an example of the automatically generated report:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8bdmis1nhcvfg6xuqmv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx8bdmis1nhcvfg6xuqmv.png" alt="Customer support investigation prompt" width="521" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s62bhpw73lk5lqxui35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s62bhpw73lk5lqxui35.png" alt="Customer support investigation report" width="486" height="540"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Investigation Reports are a great illustration of what LLMs are genuinely good at: taking a well-framed task with relevant context and producing a structured, actionable summary faster than any human could. Every monitor notification now comes with relevant information to speed up resolution.&lt;/p&gt;

&lt;p&gt;We'll be building further on this capability in the coming months — using AI alongside Bronto's logging platform to help teams reduce toil, resolve issues faster, and extract more value from their data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/bronto-labs" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Explore Bronto Labs&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>observability</category>
      <category>devops</category>
      <category>logging</category>
    </item>
    <item>
      <title>Using Dashboard Filtering to Get Customer Usage in Seconds from TBs of Data</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Fri, 22 May 2026 13:53:30 +0000</pubDate>
      <link>https://dev.to/bronto_io/using-dashboard-filtering-to-get-customer-usage-in-seconds-from-tbs-of-data-3b0d</link>
      <guid>https://dev.to/bronto_io/using-dashboard-filtering-to-get-customer-usage-in-seconds-from-tbs-of-data-3b0d</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Conall Heffernan&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As the Customer Success lead at Bronto, I need fast, reliable insights into customer health and product usage — but I don't have time to constantly update indexes, schemas, or individual widgets just to answer new questions. I need to spot patterns, explore trends, and get answers in real time without manual overhead.&lt;/p&gt;

&lt;p&gt;That's why Bronto's dashboards are so critical to my work. We recently added a new query filtering feature that lets me use SQL to look for any pattern across all widgets in a dashboard simultaneously — and update everything at lightning speed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Bronto Dashboards Provide
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rich visualisation options&lt;/strong&gt; — time-series charts, geomaps, numeric value widgets (with units like bytes and time), top lists, treemaps, and log event lists for drilling into raw data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI widget builder&lt;/strong&gt; — describe what you want in natural language; an LLM builds the query and creates the widget without you needing to know the query language or which datasets to select (&lt;a href="https://www.bronto.io/blog/building-dashboards-with-ai" rel="noopener noreferrer"&gt;see the full post on this feature&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full screen widget mode&lt;/strong&gt; — compare different timeframes for the same query (e.g. vs. previous day, week, month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy filtering&lt;/strong&gt; — use the query builder with a searchable dropdown of top keys and values, or write SQL filters directly in the filter bar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53u81j0o04n96irxymdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53u81j0o04n96irxymdu.png" alt="Bronto dashboard overview" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Power of Filtering at Scale
&lt;/h2&gt;

&lt;p&gt;In many logging or observability tools, applying a filter means updating every single chart, table, or widget individually. Tedious. Time-consuming. It doesn't scale.&lt;/p&gt;

&lt;p&gt;With Bronto, applying a filter in the main query bar &lt;strong&gt;instantly updates every single widget on the dashboard&lt;/strong&gt; for your given timeframe. With a default retention period of one year, you don't need to worry about missing long-term trends — all your data is fully searchable and visualisable. Concerned about when an issue started? You can analyze trends over months, not days.&lt;/p&gt;

&lt;p&gt;Our widgets use pre-computed log-based metrics (LBMs) for rapid responses, but the new dashboard filtering goes further — running raw log queries to filter the data in your dashboard in real time. Results come back in seconds, and you can drill down across all widgets simultaneously using SQL or by clicking from a dropdown of top keys and values.&lt;/p&gt;

&lt;p&gt;One thing worth noting: &lt;strong&gt;there's no initial configuration of keys required&lt;/strong&gt;. I can use any key I want in the filter, with no setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfox04m9gccw8zexp8vy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfox04m9gccw8zexp8vy.gif" alt="Dashboard filtering GIF — all widgets update simultaneously" width="720" height="349"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Log-Based Filtering vs. Log-Based Metrics
&lt;/h2&gt;

&lt;p&gt;Log-based metrics (LBMs) deliver millisecond responses — perfect for real-time dashboards, trend analysis, and high-level views of system behavior.&lt;/p&gt;

&lt;p&gt;When something unexpected happens or a new question comes in, we seamlessly switch to log-based filtering to investigate deeper, explore raw data, and uncover answers without being constrained by pre-defined fields or aggregations.&lt;/p&gt;

&lt;p&gt;Log-based filtering excels because it lets you query and visualize raw logs immediately — no upfront configuration like index definitions or field extraction required. Bronto combines this with structured parsing and indexing, using the right approach for the right job.&lt;/p&gt;

&lt;p&gt;The result: both fast, flexible investigation and high-performance queries on known fields, without forcing you to predefine every key or build parsing pipelines before you can search effectively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5jyxzhz2yx8fj5wbo5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5jyxzhz2yx8fj5wbo5b.png" alt="Log-based filtering in action" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Use Dashboard Filtering for Customer Usage
&lt;/h2&gt;

&lt;p&gt;My primary use for this feature is gathering and presenting product usage data to our leadership team. Questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"How much data did Org ID 54321 send over the last 6 months?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"How much did company ACME search last month?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of building 10 custom dashboards (which doesn't scale as your customer base grows), I use dashboard filtering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to our main Usage Dashboard&lt;/li&gt;
&lt;li&gt;Enter the specific &lt;code&gt;org_id&lt;/code&gt; in the main query filter (e.g. &lt;code&gt;org_id: 54321&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Every widget updates instantly to reflect only that organization's data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. A complex, multi-step data lookup becomes a quick and easy process.&lt;/p&gt;

&lt;p&gt;The first time I tried filtering across the dashboard for an &lt;code&gt;org_id&lt;/code&gt;, I thought something wasn't working right — the results were rendered so fast across terabytes of data. It was a genuine "wow" moment. As a customer support lead, it's great to see the under-the-hood changes we're building for customers also improving my own day-to-day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8dt7b0dzmrz9a3x1kaky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8dt7b0dzmrz9a3x1kaky.png" alt="Dashboard filtered by org_id" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How Bronto Dashboards Compare
&lt;/h2&gt;

&lt;p&gt;Most dashboard tools are optimized for known questions: predefined fields, fixed widgets, metrics decided on ahead of time. That works for stable monitoring but breaks down when you need to explore new questions, investigate unexpected behavior, or quickly slice data in different ways.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Bronto Dashboards&lt;/th&gt;
&lt;th&gt;Traditional Competitors&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query Scope&lt;/td&gt;
&lt;td&gt;SQL filtering on any log value&lt;/td&gt;
&lt;td&gt;Often restricted to predefined or indexed fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update Speed&lt;/td&gt;
&lt;td&gt;Instant — all widgets update simultaneously&lt;/td&gt;
&lt;td&gt;Manual, per-widget updates required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Terabytes rendered in seconds across all datasets&lt;/td&gt;
&lt;td&gt;Latency issues with large datasets; often minutes to render&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;Schema-less — no upfront definitions or configuration&lt;/td&gt;
&lt;td&gt;Parsing and indexing pipelines required first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Filtering included in monthly quota; faster MTTR&lt;/td&gt;
&lt;td&gt;Slow dashboards can inflate costs by consuming more resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Modern teams need dashboards that are fast, flexible, and easy to adapt as questions change. Bronto dashboards combine high-performance log-based metrics with instant, dashboard-wide filtering to help you explore usage, investigate issues, answer leadership questions, and quickly re-run reports as requirements evolve.&lt;/p&gt;

&lt;p&gt;Questions like "what does this customer's usage look like over the last six months?" are answered in seconds — without upfront schemas, per-widget reconfiguration, or slow refresh cycles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See Dashboard Filtering in Action&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your LLM Logs Deserve Better — Send Claude Code Events to Bronto</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Thu, 21 May 2026 18:47:19 +0000</pubDate>
      <link>https://dev.to/bronto_io/your-llm-logs-deserve-better-send-claude-code-events-to-bronto-1nmk</link>
      <guid>https://dev.to/bronto_io/your-llm-logs-deserve-better-send-claude-code-events-to-bronto-1nmk</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Conall Heffernan&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through how to integrate Anthropic's Claude Code monitoring with Bronto so you can get a clear, unified view of your AI tool usage — all in one place.&lt;/p&gt;

&lt;p&gt;This project came out of our recent company AI sprint — a chance for me, as a technical support manager, to jump into an engineering sprint and brush up on my Python and Bash skills. After years of helping customers set up and troubleshoot their environments, it was exciting to get hands-on and demo my findings to the wider team over pizza and beers.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Can Send from Claude Code to Bronto
&lt;/h2&gt;

&lt;p&gt;Claude Code has a built-in monitoring feature that provides a detailed view into how your teams use AI-assisted coding — from productivity metrics to cost visibility. When paired with Bronto, this data becomes even more useful.&lt;/p&gt;

&lt;p&gt;Here's what you can do with the exported data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understand usage patterns&lt;/strong&gt; — analyze how different users and teams engage with Claude Code over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure productivity&lt;/strong&gt; — track lines of code modified and models used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor costs&lt;/strong&gt; — get a clear picture of spend by team, user, or session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit prompts&lt;/strong&gt; — see what users are actually asking, building, or automating; identify common use cases, detect misuse, and refine prompts based on real-world usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690be8e11777bd82c81936fc_Frame%2520709.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690be8e11777bd82c81936fc_Frame%2520709.png" alt="Claude Code usage dashboard in Bronto" width="799" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690beba7d88bcdeba38bde40_Screenshot%25202025-11-06%2520at%252000.23.59%25201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690beba7d88bcdeba38bde40_Screenshot%25202025-11-06%2520at%252000.23.59%25201.png" alt="Claude Code session view" width="799" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Bronto specifically:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Centralise&lt;/strong&gt; all event logging across your organisation for unified analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualise effortlessly&lt;/strong&gt; — create dashboards using the AI widget creator without writing queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up monitors&lt;/strong&gt; to alert on sudden cost spikes, unusual token usage, or high session volumes from specific teams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast, scalable search&lt;/strong&gt; — explore and investigate your LLM logs quickly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handle high volume easily&lt;/strong&gt; — Claude Code can generate massive, detailed logs; Bronto's storage and query engine handles this without breaking your budget&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term retention&lt;/strong&gt; — 12 months of hot logs by default, which is increasingly important for support tracking, debugging non-deterministic LLM behavior, and &lt;a href="https://artificialintelligenceact.eu/article/19/" rel="noopener noreferrer"&gt;EU AI Act compliance&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Two Ways to Connect Claude Code to Bronto
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Direct (Simplest)
&lt;/h3&gt;

&lt;p&gt;Claude Code has built-in OTel support, so it can send telemetry straight to Bronto. Just set these environment variables (full details in &lt;a href="https://docs.claude.com/en/docs/claude-code/monitoring-usage" rel="noopener noreferrer"&gt;Claude Code's docs&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_ENABLE_TELEMETRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_LOGS_EXPORTER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;otlp
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http/protobuf
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://ingestion.eu.bronto.io
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_HEADERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"x-bronto-api-key=&amp;lt;api_key_here&amp;gt;,x-bronto-collection=AISprint,x-bronto-dataset=claude-code"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_LOGS_EXPORT_INTERVAL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2000
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_LOG_USER_PROMPTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this setup, Claude Code sends structured logs directly to Bronto — no additional infrastructure to run or maintain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running Claude Code locally or in a single environment&lt;/li&gt;
&lt;li&gt;No existing centralised OTel collector&lt;/li&gt;
&lt;li&gt;Lightweight, zero-maintenance integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2: Via an Existing OTel Collector
&lt;/h3&gt;

&lt;p&gt;If your organisation already runs an OTel Collector, route Claude Code's logs through that same pipeline before forwarding to Bronto. This is the approach I used — I spun up a small AWS EC2 instance running the OTel collector, configured Claude Code to send logs to it, and had the collector forward those logs to Bronto.&lt;/p&gt;

&lt;p&gt;Point Claude Code at your local collector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_ENABLE_TELEMETRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_LOGS_EXPORTER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;otlp
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_PROTOCOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;grpc
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_EXPORTER_OTLP_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:4317
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OTEL_LOG_USER_PROMPTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Persist these environment variables — otherwise starting a new terminal session will clear them and you'll wonder why nothing is flowing. Ask me how I know.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Collector receiver config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;protocols&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;grpc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4317&lt;/span&gt;  &lt;span class="c1"&gt;# Default OTLP gRPC port&lt;/span&gt;
      &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4318&lt;/span&gt;  &lt;span class="c1"&gt;# Default OTLP HTTP port&lt;/span&gt;
        &lt;span class="na"&gt;cors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;allowed_origins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregating logs from multiple sources before forwarding&lt;/li&gt;
&lt;li&gt;Applying processors like filters&lt;/li&gt;
&lt;li&gt;Keeping your Bronto exporter config in one place&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Generating Some Log Data
&lt;/h2&gt;

&lt;p&gt;To verify everything was working, I fired up Claude Code and asked it to create a simple Python script, then made a few tweaks — spin up a basic GUI, take an input value, save it locally. While I was doing this, Claude Code was generating log events in the background, flowing through my OTel collector and straight into Bronto.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bea21d87d6e7006211527_image%2520700.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bea21d87d6e7006211527_image%2520700.png" alt="Log events flowing into Bronto" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Searching Claude Logs in Bronto
&lt;/h2&gt;

&lt;p&gt;During testing, Claude was hitting errors because I'd been using an old revoked API key from a previous project. While the UI showed an error, I could also see the API errors flowing into Bronto. A simple &lt;code&gt;error&lt;/code&gt; query surfaced the logs directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690be95616906ce6b7a55a0c_image%2520699.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690be95616906ce6b7a55a0c_image%2520699.png" alt="API error logs in Bronto" width="799" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also created a tree map to instantly visualize which models were used across the project — making it easy to see usage patterns and identify which models were most heavily used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bec82e2cb74fc78804c2d_Screenshot%25202025-10-24%2520at%252012.32.40%25E2%2580%25AFp.m.%25201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bec82e2cb74fc78804c2d_Screenshot%25202025-10-24%2520at%252012.32.40%25E2%2580%25AFp.m.%25201.png" alt="Tree map of model usage" width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token usage analysis&lt;/strong&gt; — a multi group-by on &lt;code&gt;$input_tokens&lt;/code&gt; and &lt;code&gt;$output_tokens&lt;/code&gt; helps identify outliers and inefficient prompts, control API spend, and uncover latency trends tied to model size or context length.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690becd8686f49d58caa4d4c_Screenshot%25202025-10-24%2520at%252012.35.21%25E2%2580%25AFp.m.%25201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690becd8686f49d58caa4d4c_Screenshot%25202025-10-24%2520at%252012.35.21%25E2%2580%25AFp.m.%25201.png" alt="Token usage by group" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session duration&lt;/strong&gt; — tracking how long each session stayed active helps identify inefficient workflows, idle connections, or hanging requests that inflate costs. It's also a signal for user engagement: shorter sessions suggest quick completions, longer ones can point to complex prompts or slow model responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bedbb76c4988680f8437c_Screenshot%25202025-10-31%2520at%25202.30.09%25E2%2580%25AFp.m.%25201.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bedbb76c4988680f8437c_Screenshot%25202025-10-31%2520at%25202.30.09%25E2%2580%25AFp.m.%25201.png" alt="Session duration over time" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt auditing&lt;/strong&gt; — prompts are redacted by default. To enable them, set &lt;code&gt;OTEL_LOG_USER_PROMPTS=1&lt;/code&gt; (already included in the config above).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bed71d7567084d5dc4c63_image%2520701.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F690bed71d7567084d5dc4c63_image%2520701.png" alt="Prompt audit log view" width="800" height="503"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;From Claude Code to Bronto, all connected through OpenTelemetry — a few environment variables, an open source agent, and you have logs and insights flowing in. Once it's set up, you get full visibility into how your AI tooling is being used, cost metrics, and what's actually happening under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: Full OTel Collector Config
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Receivers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;protocols&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;grpc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4317&lt;/span&gt;
      &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:4318&lt;/span&gt;
        &lt;span class="na"&gt;cors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;allowed_origins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Exporters:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otlphttp/brontobytes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;logs_endpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://ingestion.eu.bronto.io/v1/logs"&lt;/span&gt;
    &lt;span class="na"&gt;compression&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gzip&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;x-bronto-api-key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INGESTION_API_KEY_GOES_HERE"&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
    &lt;span class="na"&gt;retry_on_failure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;initial_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;max_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
      &lt;span class="na"&gt;max_elapsed_time&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;120s&lt;/span&gt;
    &lt;span class="na"&gt;sending_queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;num_consumers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;queue_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pipelines:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;logs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;receivers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlp&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;processors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;memory_limiter&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;batch&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;transform&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;exporters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;otlphttp/brontobytes&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://app.eu.bronto.io/signup" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Try Bronto Free — Send Your First Claude Logs&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>logging</category>
      <category>devops</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>The 4L's of Logging — The New Standard for the AI Era</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Thu, 21 May 2026 16:18:59 +0000</pubDate>
      <link>https://dev.to/bronto_io/the-4ls-of-logging-the-new-standard-for-the-ai-era-amc</link>
      <guid>https://dev.to/bronto_io/the-4ls-of-logging-the-new-standard-for-the-ai-era-amc</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Noel Ruane &amp;amp; Trevor Parsons&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Traditionally, the world of logging had 3L's — &lt;strong&gt;low cost, low latency, and large scale&lt;/strong&gt; — and the accepted wisdom was that you'd have to compromise by choosing two of the three.&lt;/p&gt;

&lt;p&gt;Until recently, the logging market bore this out. Solutions fell into one of three categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expensive and fast&lt;/strong&gt; — Datadog, ELK-type solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow and cheap(er)&lt;/strong&gt; — index-free solutions like Grafana Loki (still relatively expensive when you actually search)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tweeners&lt;/strong&gt; — expensive, relatively slow, not great at scale — think CloudWatch, Google Cloud Logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anyone who's ever hired builders knows the iron rule: fast, cheap, or quality — pick two.&lt;/p&gt;

&lt;p&gt;But the world has changed with the arrival of AI. As everyone builds and innovates around LLMs and agents, achieving all three L's has become more essential than ever. And AI has added a fourth requirement to the list: &lt;strong&gt;long-term retention&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  3 Becomes 4: What Long-Term Retention Opens Up
&lt;/h2&gt;

&lt;p&gt;With a short retention window — 3, 15, or 30 days, as most solutions provide — logs are accessible only for outages and system issues severe enough to warrant immediate attention.&lt;/p&gt;

&lt;p&gt;Long-term retention opens up an entirely new set of use cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8wsq74b6biq4g2ba5fi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8wsq74b6biq4g2ba5fi.png" alt="Use cases enabled by long-term retention" width="799" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A common example: comparing customer API request latencies or conversion metrics month-on-month, looking at seasonality effects, or preparing data for a customer QBR. You might want to analyze customers in aggregate to spot trends, or drill into a specific customer's history.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3wxbcxs5m1zgxupc052.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3wxbcxs5m1zgxupc052.png" alt="Long-term data analysis example" width="800" height="753"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Support issues can be recent or ongoing over months. Being able to look beyond a few days of data is often the difference between resolving a customer issue properly and leaving it half-answered.&lt;/p&gt;

&lt;p&gt;In an AI world where agents can interrogate your data and solve problems at a completely different scale, long-term &lt;strong&gt;hot&lt;/strong&gt; retention is essential — not the cumbersome, costly process of rehydrating logs from archives that most providers offer. The common workaround of dumping logs to S3 and spinning up Athena is another great example of the kind of toil that shouldn't exist in modern logging.&lt;/p&gt;

&lt;p&gt;The importance of long-term hot data is borne out by our own customers: the longer the data retention, the more use cases open up, the longer the time periods searched, and the more value customers extract from their data. This value will only increase as agents become more powerful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqh1ehixwoippfxnksta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqh1ehixwoippfxnksta.png" alt="Retention vs. value curve" width="430" height="298"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The 4L's Set a New Standard for the AI Era
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://youtu.be/w87UvmMcmW4?si=Ob5CXZ4tEmjew8eI" rel="noopener noreferrer"&gt;Microsoft's Satya Nadella put it well&lt;/a&gt;: AI is at the intersection of where intelligence meets data. If you don't have the data, you can't point AI at it. The new requirements for data in the AI era map directly to the 4L's:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Large Scale&lt;/strong&gt; — data that is always and easily accessible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-Term Retention&lt;/strong&gt; — data retained so AI can analyze it and take action on it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low Cost&lt;/strong&gt; — storage and search must be cost-efficient as volumes see exponential growth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low Latency&lt;/strong&gt; — fast search and querying&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As Bessemer Venture Partners notes in their &lt;a href="https://www.bvp.com/atlas/roadmap-ai-infrastructure" rel="noopener noreferrer"&gt;Roadmap for AI Infrastructure&lt;/a&gt;, the volume of unstructured data is projected to reach 612 zettabytes by 2030, driven by ML/AI and synthetic data generation. Log data is one specific type where this applies acutely.&lt;/p&gt;

&lt;p&gt;Here's why logging requirements are intensifying in the AI era:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Systems of action are replacing systems of record.&lt;/strong&gt; As AI-native apps don't just store data but act on it, having log data accessible and hot means AI models can actually do something meaningful with it. The era of retaining logs for 30 days and locking them in cold storage is ending.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-led Root Cause Analysis.&lt;/strong&gt; We're already seeing AIs perform RCA using log data to help teams solve problems faster. &lt;a href="https://www.bronto.io/blog/error-investigations-with-brontoscope" rel="noopener noreferrer"&gt;BrontoScope&lt;/a&gt;, for example, automatically runs an investigation that assesses how widespread an error is — figuring out what queries to run, assessing results, running further queries, and delivering an immediate assessment with recommendations. It can save anywhere from minutes to hours per incident. As MCP becomes standard, log data will be analyzable with additional context from other systems — replicating what engineers do manually today when piecing together evidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLMs are non-deterministic.&lt;/strong&gt; Given the unpredictable nature of LLMs, logs will play an even more important role in understanding what happened — whether from a support, debugging, audit, security, or compliance perspective. The same action can deliver different results, making logs essential for showing exactly what occurred. (See our &lt;a href="https://www.bronto.io/blog/logging-events-from-ollama" rel="noopener noreferrer"&gt;guide to collecting Ollama logs&lt;/a&gt; to get started.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regulation.&lt;/strong&gt; &lt;a href="https://artificialintelligenceact.eu/article/19/" rel="noopener noreferrer"&gt;EU AI Act Article 19&lt;/a&gt;, for example, will require logs of certain higher-risk AI systems to be retained for at least 6 months from 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Purpose-Built Adaptive Technology for Log Data
&lt;/h2&gt;

&lt;p&gt;How do you actually achieve all 4L's? The short answer: treat logs as a first-class citizen and build a purpose-built data store for them.&lt;/p&gt;

&lt;p&gt;Until now, vendors have always repurposed existing datastores for logging:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Elasticsearch&lt;/strong&gt; — at the core of most logging solutions for two decades. Super fast because it indexes everything, but super expensive due to data bloat and operational complexity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClickHouse&lt;/strong&gt; — more recently vendors have turned to ClickHouse. Great for metrics, good for some logs — but only if you know the shape of your data in advance, which is a non-starter for a general-purpose logging platform. (Full write-up: &lt;a href="https://www.bronto.io/blog/why-clickhouse-fails-as-a-general-purpose-logging-solution" rel="noopener noreferrer"&gt;Why ClickHouse fails as a general-purpose logging solution&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snowflake&lt;/strong&gt; — vendors like Observe have turned to Snowflake to address the cost problem, but customers regularly report search latency issues and burning through CPU credits when searching large data volumes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key learning from two decades in this space: &lt;strong&gt;not all log data should be treated equally&lt;/strong&gt;. Don't just "index everything" (inverted indexes) or at the other extreme "index nothing" (index-free solutions). Instead, take a more thoughtful approach — &lt;strong&gt;let the system adapt and optimize based on the shape of your data and the queries you run&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These optimizations operate at every layer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexing:&lt;/strong&gt; We combine multiple patent-pending adaptive compact indexes. The platform automatically decides which to apply based on data and query patterns. Techniques include Bloom filters, summary indexes, partition indexing, automatic log-to-metric conversion, and additional patent-pending technologies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage:&lt;/strong&gt; Our storage tier routes data to the most appropriate structure based on its shape and how it will be searched — columnar store, specialized data structures, high compression, partial decoding, and storage tier coordination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search:&lt;/strong&gt; Specialized algorithms applied depending on the query being run, highly parallelized to operate at any scale.&lt;/p&gt;

&lt;p&gt;For deeper reading: &lt;a href="https://www.bronto.io/blog/why-is-bronto-so-fast" rel="noopener noreferrer"&gt;Why Bronto is so fast at searching logs&lt;/a&gt;, or the Log Management Cost Trap series — &lt;a href="https://www.bronto.io/blog/cost-trap-ingestion" rel="noopener noreferrer"&gt;Part 1: Ingestion&lt;/a&gt;, &lt;a href="https://www.bronto.io/blog/cost-trap-storage" rel="noopener noreferrer"&gt;Part 2: Storage&lt;/a&gt;, &lt;a href="https://www.bronto.io/blog/cost-trap-search" rel="noopener noreferrer"&gt;Part 3: Search&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  No Maintenance Required
&lt;/h2&gt;

&lt;p&gt;One important point often overlooked: we see vendors trying to solve the cost problem by deploying within an organization's cloud (BYOC), using the customer's hardware to reduce costs. This requires deploying a logging platform into your environment and having the vendor manage it there. Unless you have tight data regulations, this feels like an overly complex backward step.&lt;/p&gt;

&lt;p&gt;Bronto is built on a modern cloud multi-tenanted architecture. No setup, no maintenance, no management required by the customer. We take advantage of compute/storage separation and independently scaling services.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI = the Intersection of Intelligence and Data
&lt;/h2&gt;

&lt;p&gt;If your current provider is building AI capabilities on top of an old, repurposed, or expensive logging platform — one with 2 L's instead of 4 — you won't be able to take full advantage of the AI revolution.&lt;/p&gt;

&lt;p&gt;The 4L's aren't a nice-to-have. In the AI era, they're the baseline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See the 4L's in Action&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>logging</category>
      <category>ai</category>
      <category>devops</category>
      <category>observability</category>
    </item>
    <item>
      <title>BrontoScope: AI-Powered Error Investigations</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Thu, 21 May 2026 14:37:22 +0000</pubDate>
      <link>https://dev.to/bronto_io/brontoscope-ai-powered-error-investigations-451c</link>
      <guid>https://dev.to/bronto_io/brontoscope-ai-powered-error-investigations-451c</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Marco Aquilanti&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today we're introducing &lt;strong&gt;BrontoScope&lt;/strong&gt;, one of the &lt;a href="https://www.bronto.io/blog/introducing-bronto-labs" rel="noopener noreferrer"&gt;Bronto AI Labs initiatives&lt;/a&gt; aimed at reducing user toil, increasing team efficiency, and reducing MTTR.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with AI in Observability
&lt;/h2&gt;

&lt;p&gt;Almost every software company is adding AI features to their products — often with mixed results. As a user, I'm frequently annoyed by the continuous stream of AI features popping up everywhere: messaging apps that want you to chat with an LLM while you're looking for your friends, search engines surfacing LLM answers first and leaving you wondering whether what you're reading is true or a hallucination.&lt;/p&gt;

&lt;p&gt;The observability space is no exception. Many products are being "enriched" with AI features, but most are missing the point. Here's why.&lt;/p&gt;

&lt;p&gt;Observability has always been hard. A production system can easily produce terabytes of logs, millions of traces, and millions of metrics every hour — too much for any human to easily inspect. LLMs should be the next pillar in observability, reducing burden and improving reliability. But only if focused on making the user's life &lt;em&gt;simpler&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Most current AI features in observability actually make the user's life &lt;em&gt;harder&lt;/em&gt; by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Requiring a detailed prompt as input&lt;/strong&gt; — users must invest significant time crafting prompts to get well-structured responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Producing long, verbose text responses&lt;/strong&gt; — even when the AI has nailed the request, the answer is often diluted across lines and lines of text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Taking too long&lt;/strong&gt; — complex multi-step LLM workflows leave users waiting far too long for answers during an incident&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bronto Approach
&lt;/h2&gt;

&lt;p&gt;At Bronto, we're extending the logging platform with LLM capabilities focused on one goal: &lt;strong&gt;automating recurring work patterns to make the user's life simpler, not harder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Our &lt;a href="https://www.bronto.io/blog/introducing-bronto-labs" rel="noopener noreferrer"&gt;Bronto Labs initiative&lt;/a&gt; is built around three tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-Parsing&lt;/strong&gt; — using AI to automatically structure logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Dashboard Creation&lt;/strong&gt; — generating dashboards from natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BrontoScope&lt;/strong&gt; — AI-powered incident investigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The philosophy behind all of these: before adding any new feature, we make sure it will be genuinely useful to most users and won't slow down or hinder any of their existing tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  BrontoScope
&lt;/h2&gt;

&lt;p&gt;Incidents don't wait for business hours. When an alert fires at 3am, one or a few on-call engineers need to move fast — often without access to the domain experts who know the affected system best.&lt;/p&gt;

&lt;p&gt;The first steps of any incident are always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Understand&lt;/strong&gt; the scope of the incident&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimate&lt;/strong&gt; the impact on customers and the broader system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assign&lt;/strong&gt; a priority and decide how to tackle it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Staying calm, thinking clearly, and acting quickly are all required — even when you've just been woken up. But too much haste leads to incorrect diagnosis.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LLMs can help enormously in these scenarios — they can summarize large amounts of data in seconds and are not affected by panic, confusion, or a 3am wake-up call.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;BrontoScope automates the incident investigation process with &lt;strong&gt;a single click on any error event in your logs&lt;/strong&gt;. The LLM writes and runs tens of queries against your data, analyzes the results, generates a summary report, and delivers it to you in just a few seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Report Includes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope&lt;/strong&gt; — when the errors started appearing, and which users, customers, services, regions, or hosts are affected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Probable causes&lt;/strong&gt; — resource exhaustion, network issues, software bugs, traffic spikes, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggestions&lt;/strong&gt; — how to stop the error occurring or how to continue the investigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supporting data&lt;/strong&gt; — the query results and charts that led the LLM to its conclusions, so you can validate that the model isn't hallucinating&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hphkq14ezp4yff22apx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hphkq14ezp4yff22apx.png" alt="BrontoScope architecture diagram" width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The process works in stages: first, the LLM analyzes the error and its surrounding context to guide subsequent data retrieval. The search engine then queries the relevant data and presents all findings to the LLM in a single comprehensive prompt — essentially, an ad-hoc dashboard built around the error and composed of many charts. The final response is streamed to the user via Server-Sent Events, allowing them to read the output as it's generated in real time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrmzdzs9id4tvm2wb0zk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrmzdzs9id4tvm2wb0zk.png" alt="BrontoScope investigation report example 1" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbjfdfpclr3dtipr4hni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbjfdfpclr3dtipr4hni.png" alt="BrontoScope investigation report example 2" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;BrontoScope is powered by &lt;a href="https://aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;AWS Bedrock&lt;/a&gt;'s most advanced AI models, ensuring all data is processed within the AWS ecosystem — prompts and responses are never stored or shared with model providers or third parties.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Actually Makes Life Easier
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No prompt required&lt;/strong&gt; — just click on a log event. The LLM analyzes and understands the error, writes its own filter to find similar occurrences, and scans the data autonomously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concise reports&lt;/strong&gt; — goes straight to the point, with charts included to maximize the information density&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast&lt;/strong&gt; — in most cases the report is streamed to the user in &lt;strong&gt;under 10 seconds&lt;/strong&gt;, even though tens of queries are run per investigation, thanks to the speed of Bronto's search engine&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Availability
&lt;/h2&gt;

&lt;p&gt;BrontoScope is currently available on request and is being used internally by the Bronto team as well as by a number of design partner customers in real-world situations. Improvements will be made in the coming months.&lt;/p&gt;

&lt;p&gt;This is just one of the AI features being developed at Bronto — stay tuned for future posts, or &lt;a href="https://www.bronto.io/bronto-labs" rel="noopener noreferrer"&gt;join our AI initiative&lt;/a&gt; and help shape what we build next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/bronto-labs" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Join Bronto Labs&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>observability</category>
      <category>devops</category>
      <category>logging</category>
    </item>
    <item>
      <title>Log Parsing with AI at Bronto</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Thu, 21 May 2026 01:06:23 +0000</pubDate>
      <link>https://dev.to/bronto_io/log-parsing-with-ai-at-bronto-18d2</link>
      <guid>https://dev.to/bronto_io/log-parsing-with-ai-at-bronto-18d2</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Gary Nicholls&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post follows on from our &lt;a href="https://www.bronto.io/blog/benchmarking-aws-nova-on-log-data-how-it-compares-to-chatgpt-3-5" rel="noopener noreferrer"&gt;AWS Nova log benchmarking&lt;/a&gt; article, where we explored how smaller LLMs perform on log analysis tasks. That earlier post highlighted that LLMs are surprisingly good at parsing logs. While that work focused on &lt;em&gt;understanding&lt;/em&gt; logs, this post tackles an earlier step: &lt;strong&gt;automatically structuring logs using AI&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Logging Origins
&lt;/h2&gt;

&lt;p&gt;Logs are one of the oldest — and still most valuable — forms of observability. Mainframes and early Unix systems were already using logs to record system activity, with tools like syslog dating back to the early 1980s.&lt;/p&gt;

&lt;p&gt;Even as systems have become more distributed and complex, logs remain foundational, especially for investigating issues when things go wrong. Logs are typically written to local files before being shipped to modern observability platforms using agents like OpenTelemetry or Fluent Bit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why So Many Formats?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry (OTel)&lt;/a&gt; project is encouraging the adoption of structured JSON logs — and that's a good thing. Structured logs are easier to search, more human-readable, safer to manipulate, and more cloud-native.&lt;/p&gt;

&lt;p&gt;But the reality isn't that simple. Many systems still generate unstructured or semi-structured logs where key=value pairs are embedded inside free-text messages. And even among structured formats, things vary wildly — timestamps alone appear in dozens of different formats.&lt;/p&gt;

&lt;p&gt;Logs reflect the unique fingerprint of each tech stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;syslog&lt;/strong&gt; — still widely used, with quirks in its timestamp formatting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache&lt;/strong&gt; — uses the Common Log Format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;nginx&lt;/strong&gt; — has its own custom variant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Java apps&lt;/strong&gt; — use logback, log4j, or slf4j&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS services&lt;/strong&gt; — often emit structured JSON or a hybrid of text and JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;With no single standard, Bronto set out to solve the problem in an innovative way — using AI to generate parsers automatically, reducing the toil and complexity that users typically face.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Automated Log Parsing
&lt;/h2&gt;

&lt;p&gt;Parsing logs in real time is a performance-critical operation. When ingesting millions of events per second, every millisecond counts. Regex-based parsing can be complex and hard to maintain, requiring expertise in tools like Grok or Dissect — and can become a bottleneck at scale when applied indiscriminately.&lt;/p&gt;

&lt;p&gt;At Bronto, we use a &lt;strong&gt;multi-layered approach&lt;/strong&gt; that separates offline detection from online parsing. Online parsing happens in real time as part of the ingestion pipeline; offline detection occurs outside the pipeline with a short delay. This hybrid approach ensures speed without sacrificing flexibility, while reducing user toil.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Curated Java Parsers
&lt;/h3&gt;

&lt;p&gt;We maintain a library of high-performance Java-based parsers, optimized for the most common formats seen at high volumes across multiple customers. These are purpose-built for speed and designed to &lt;strong&gt;fail fast&lt;/strong&gt; if they encounter a log that doesn't match their expected format.&lt;/p&gt;

&lt;p&gt;After applying a Java parser, we run additional lightweight processors to normalize key fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timestamp parser&lt;/strong&gt; — auto-detects and normalizes varied timestamp formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log level parser&lt;/strong&gt; — maps diverse severity keywords into five standard levels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KVP parser&lt;/strong&gt; — extracts key=value pairs from the message or body, even if only present in some events&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Dissect and Grok Fallback
&lt;/h3&gt;

&lt;p&gt;For less common but still important formats, we fall back to Dissect or Grok:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dissect&lt;/strong&gt; — fast and great for structured, delimiter-based logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok&lt;/strong&gt; — more flexible, supports regex-based parsing, but comes at a performance cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bronto maintains a large database of both dissect and grok patterns. Due to their runtime cost, we don't attempt to apply every pattern to every event online. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We sample log events offline&lt;/li&gt;
&lt;li&gt;Match them against our full pattern library&lt;/li&gt;
&lt;li&gt;If a match is found, we automatically assign a parser hint to the dataset&lt;/li&gt;
&lt;li&gt;Future events in that dataset are parsed using the matched pattern&lt;/li&gt;
&lt;li&gt;We gather metrics on parse quality per dataset and periodically revalidate hints&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Layer 3: AI-Generated Parsing
&lt;/h3&gt;

&lt;p&gt;When we encounter unknown or proprietary formats, other tools might require users to handcraft regexes through a UI. At Bronto, we let AI do the work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F688a38563554e574823661f9_Custom%2520Parser%2520%282%29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F688a38563554e574823661f9_Custom%2520Parser%2520%282%29.png" alt="Bronto LLM-powered log parser" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When enabled, we send a sample of the dataset to an internal AI engine that analyzes the log structure and generates a custom dissect pattern. We test the pattern against a wider sample. If it matches a high percentage of events, we present the pattern and sample results to the user — they can tweak field names if desired — and once approved, the parser is saved and applied automatically to all future events in that dataset.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Worked Example
&lt;/h2&gt;

&lt;p&gt;Suppose your application logs look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q3ap21nrcx9fd220hvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3q3ap21nrcx9fd220hvy.png" alt="Raw unstructured application logs" width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After analyzing hundreds of lines, the AI generates a dissect pattern:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhtxnhc9zxqf5hj1nwz9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwhtxnhc9zxqf5hj1nwz9.png" alt="AI-generated dissect pattern" width="800" height="62"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The parsed result looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;app_name&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;timestamp&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;log_level&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;message&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;APP01&lt;/td&gt;
&lt;td&gt;2024-07-03 12:50:59&lt;/td&gt;
&lt;td&gt;WARN&lt;/td&gt;
&lt;td&gt;Invalid sessionId: sessionId=expired&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APP01&lt;/td&gt;
&lt;td&gt;2024-07-03 12:49:41&lt;/td&gt;
&lt;td&gt;INFO&lt;/td&gt;
&lt;td&gt;Authentication token issued&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APP01&lt;/td&gt;
&lt;td&gt;2024-07-03 12:48:27&lt;/td&gt;
&lt;td&gt;INFO&lt;/td&gt;
&lt;td&gt;User logout: userId=14141&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APP01&lt;/td&gt;
&lt;td&gt;2024-07-03 12:47:10&lt;/td&gt;
&lt;td&gt;INFO&lt;/td&gt;
&lt;td&gt;New login attempt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APP01&lt;/td&gt;
&lt;td&gt;2024-07-03 12:45:37&lt;/td&gt;
&lt;td&gt;INFO&lt;/td&gt;
&lt;td&gt;Form submitted: formId=contact-us&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The KVP parser then further extracts fields like &lt;code&gt;sessionId&lt;/code&gt;, &lt;code&gt;userId&lt;/code&gt;, and &lt;code&gt;formId&lt;/code&gt; from the &lt;code&gt;message&lt;/code&gt; value.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F688a36f5b051b55fb1e13089_Bronto%2520AI%2520Log%2520Parser%2520Diagram%2520%281%29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F688a36f5b051b55fb1e13089_Bronto%2520AI%2520Log%2520Parser%2520Diagram%2520%281%29.png" alt="Full AI log parser pipeline diagram" width="616" height="355"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the Hood: AWS Bedrock
&lt;/h2&gt;

&lt;p&gt;We use AWS Bedrock as a managed service to access LLMs (including Claude). Our infrastructure chooses the most appropriate model for each application and sends structured prompts — for example, instructing the LLM which patterns to avoid and how to handle keys like timestamps. The user doesn't have to worry about models or prompts; they just use the application.&lt;/p&gt;

&lt;p&gt;Bedrock also provides important SaaS-grade guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in safeguards to detect and filter harmful content&lt;/li&gt;
&lt;li&gt;Never stores or uses our data to train models&lt;/li&gt;
&lt;li&gt;All data remains within the AWS network&lt;/li&gt;
&lt;li&gt;Works seamlessly with Lambda and S3 — no platform rearchitecting required&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;At Bronto we believe parsing should be fast, accurate, and hands-free. Today we generate dissect patterns using AI. Soon we'll be generating Grok patterns too — bringing AI to even more complex and less structured formats.&lt;/p&gt;

&lt;p&gt;As OTel continues to push for JSON-based structured logging, the hope is that log parsing becomes a less painful problem over time. But until then, automated, adaptive parsing isn't just a convenience — it's a necessity.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Bronto combines curated Java parsers, flexible Dissect/Grok matching, and AI-powered pattern generation into a unified pipeline for parsing any log format, structured or otherwise.&lt;/p&gt;

&lt;p&gt;If your logs are weird or messy — &lt;strong&gt;we've got you&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/bronto-labs" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Explore Bronto's AI Features&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
    </item>
    <item>
      <title>Benchmarking AWS Nova on Log Data: How It Compares to ChatGPT-3.5</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Wed, 20 May 2026 21:18:45 +0000</pubDate>
      <link>https://dev.to/bronto_io/benchmarking-aws-nova-on-log-data-how-it-compares-to-chatgpt-35-2aa1</link>
      <guid>https://dev.to/bronto_io/benchmarking-aws-nova-on-log-data-how-it-compares-to-chatgpt-35-2aa1</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Benoit Gaudin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post explores the use of large language models (LLMs) for analyzing log data. To do so, we reproduced part of the &lt;a href="https://arxiv.org/abs/2309.07938" rel="noopener noreferrer"&gt;An Assessment of ChatGPT on Log Data&lt;/a&gt; benchmark, originally conducted in 2023 by Intel researchers Priyanka Mudgal and Rita Wouhaybi.&lt;/p&gt;

&lt;p&gt;While that initial benchmark used ChatGPT-3, our study evaluates the &lt;a href="https://aws.amazon.com/ai/generative-ai/nova/understanding/" rel="noopener noreferrer"&gt;AWS Nova Micro model&lt;/a&gt;. Our goal: assess whether more recent, smaller, and cheaper models can match — or exceed — the performance of ChatGPT-3 from a few years ago. The economics are particularly interesting: Nova Micro's cost per input token is &lt;strong&gt;14 times lower&lt;/strong&gt; than GPT-3.5-turbo was two years ago.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark Setup
&lt;/h2&gt;

&lt;p&gt;The original benchmark evaluated GPT-3.5-turbo across ten research questions grouped into four categories:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Log Parsing &amp;amp; Analytics&lt;/strong&gt; — Can the model parse logs and identify errors, root causes, security events, and anomalies? Can it identify frequently used APIs?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prediction&lt;/strong&gt; — Can it predict future log events based on past logs?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarization&lt;/strong&gt; — Can it summarize single and multiple log messages?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;General Capabilities&lt;/strong&gt; — Can it handle bulk log data, and what message lengths can it process?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Experiments used datasets from the &lt;a href="https://arxiv.org/abs/2008.06448" rel="noopener noreferrer"&gt;Loghub collection&lt;/a&gt; — 2,000 labeled log messages from various systems (Windows, Linux, mobile, distributed, etc.).&lt;/p&gt;

&lt;p&gt;Our experiment reused the same methodology and the same 19 Loghub datasets, with these differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We evaluated &lt;strong&gt;AWS Nova Micro&lt;/strong&gt; rather than GPT-3.5-turbo&lt;/li&gt;
&lt;li&gt;We focused on the first three categories (7 questions) — the fourth category covers context window size, which is no longer a meaningful differentiator (GPT-3.5-turbo: 16,385 tokens; Nova Micro: 128,000 tokens)&lt;/li&gt;
&lt;li&gt;Where the original benchmark tested multiple input sizes (e.g. 5, 10, 50 log entries), we used only the maximum (50), to give the model the most context&lt;/li&gt;
&lt;li&gt;Results were manually evaluated by a human, using the same prompts as the original&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Log Parsing&lt;/td&gt;
&lt;td&gt;Q1&lt;/td&gt;
&lt;td&gt;Extract the log template and variables from this log message.&lt;/td&gt;
&lt;td&gt;How does the model perform on log parsing?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Analytics&lt;/td&gt;
&lt;td&gt;Q2&lt;/td&gt;
&lt;td&gt;Summarize the errors and warnings and identify the root cause.&lt;/td&gt;
&lt;td&gt;Can it extract errors and root causes from raw logs?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Analytics&lt;/td&gt;
&lt;td&gt;Q3&lt;/td&gt;
&lt;td&gt;Show the APIs called most with count.&lt;/td&gt;
&lt;td&gt;Can it perform advanced analytics tasks?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Analytics&lt;/td&gt;
&lt;td&gt;Q4&lt;/td&gt;
&lt;td&gt;Are there any malicious users, URLs, IPs, and connection status?&lt;/td&gt;
&lt;td&gt;Can it extract security information?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Analytics&lt;/td&gt;
&lt;td&gt;Q5&lt;/td&gt;
&lt;td&gt;Detect the anomalies from the following log messages.&lt;/td&gt;
&lt;td&gt;Can it detect anomalies?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Analytics&lt;/td&gt;
&lt;td&gt;Q6&lt;/td&gt;
&lt;td&gt;Predict the next 10 log events based on these log messages.&lt;/td&gt;
&lt;td&gt;Can it predict future events?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log Summarization&lt;/td&gt;
&lt;td&gt;Q7&lt;/td&gt;
&lt;td&gt;Summarize the log message.&lt;/td&gt;
&lt;td&gt;Can it summarize a single log message?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Results: AWS Nova Micro's Performance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Correct Answers&lt;/th&gt;
&lt;th&gt;Remarks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Extract log template and variables&lt;/td&gt;
&lt;td&gt;17/19 (89%)&lt;/td&gt;
&lt;td&gt;Failed on HDFS logs; IDs not always categorized accurately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarize errors and identify root cause&lt;/td&gt;
&lt;td&gt;10/19 (53%)&lt;/td&gt;
&lt;td&gt;Erroneously reports warnings in Hadoop logs; confuses timestamps and error codes in HPC; over-reports issues in HealthApp and Mac logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Show most-called APIs with count&lt;/td&gt;
&lt;td&gt;4/19 (21%)&lt;/td&gt;
&lt;td&gt;Counting is very challenging; many datasets lack API-related entries; model over-reports results that don't make sense&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detect malicious users, URLs, IPs&lt;/td&gt;
&lt;td&gt;18/19 (95%)&lt;/td&gt;
&lt;td&gt;High accuracy, but hard to conclude on the general case as no obvious security issues were present in the sampled logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detect anomalies&lt;/td&gt;
&lt;td&gt;9/19 (47%)&lt;/td&gt;
&lt;td&gt;Reports anomalies based on irrelevant criteria (e.g. entries that "occur towards the end of the sample" or are "repetitive")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predict next 10 log events&lt;/td&gt;
&lt;td&gt;0/19 (0%)&lt;/td&gt;
&lt;td&gt;Even for extremely repetitive logs, IDs and timestamps are not predicted correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarize a single log message&lt;/td&gt;
&lt;td&gt;16/19 (84%)&lt;/td&gt;
&lt;td&gt;Good results overall; challenging for unfamiliar log formats without named fields&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In summary, our evaluation confirms the findings of the original benchmark: similar to ChatGPT-3, &lt;strong&gt;Nova Micro performs well at parsing and summarizing log data&lt;/strong&gt;. Other types of analysis — counting, anomaly detection, prediction — remain challenging for LLMs.&lt;/p&gt;

&lt;p&gt;The malicious content detection result (95%) looks strong, but needs a caveat: the sampled datasets didn't contain clearly malicious entries. The model didn't produce false positives here, which is valuable in itself — especially compared to anomaly detection, where false positives were common.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This benchmark demonstrates that it is now possible to achieve parsing and summarising of log data in a much more cost-effective way.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Reflection on Datasets
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/abs/2008.06448" rel="noopener noreferrer"&gt;Loghub collection&lt;/a&gt; is invaluable for reproducible benchmarking — without it, meaningful cross-benchmark comparisons would be impossible. That said, the datasets have some limitations worth noting.&lt;/p&gt;

&lt;p&gt;At Bronto, we work frequently with log types common in real-world production environments: CDN logs, web access logs, AWS CloudTrail audit logs, application logs. LLMs tend to have a strong understanding of these formats because they're widely documented and structured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured logs change the picture significantly.&lt;/strong&gt; When we ran Q2 and Q3 prompts against synthetic structured CDN log data (based on real examples), the model performed substantially better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For Q2 (error identification), the model perfectly identified HTTP errors by associating status codes ≥ 400 with errors — even though the field name never used the word "error". It correctly categorized 400 (Client-Side), 404 (Not Found), 500 (Internal Server Error), and 503 (Service Unavailable).&lt;/li&gt;
&lt;li&gt;For Q3 (most-called APIs), the model correctly identified the &lt;code&gt;reqPath&lt;/code&gt; field as representing API endpoints and extracted the top results accurately.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Counting remains a consistent weakness&lt;/strong&gt; across all dataset types. When Q3 requires providing a count of the most common API calls, the model's counts are frequently inaccurate regardless of dataset.&lt;/p&gt;

&lt;p&gt;One additional observation: several Loghub datasets (HPC, HealthApp, BGL, Proxifier) appear to be uncommon enough that Nova Micro doesn't have a solid prior understanding of them. When asked to generate sample logs for these systems, the output doesn't resemble the actual Loghub data — suggesting the model is less reliable when operating outside familiar territory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This benchmark reproduces the 2023 ChatGPT log analysis study using AWS Nova Micro. The results are strikingly similar — with one major difference: &lt;strong&gt;the cost per token is 14x lower&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Given that log data is notoriously voluminous, this cost difference matters enormously for any production use of LLMs in log analysis pipelines.&lt;/p&gt;

&lt;p&gt;The Loghub datasets are also not fully representative of what most production logging systems generate. Real-world logs — web access, CDN, application, audit — tend to be more structured and more familiar to LLMs, which leads to better performance than the benchmark scores suggest.&lt;/p&gt;

&lt;p&gt;We believe LLMs have genuine potential to improve production logging systems, particularly for analyzing the common, structured log formats that make up the majority of real-world observability data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/bronto-labs" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Explore Bronto's AI Features&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>logging</category>
      <category>machinelearning</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why ClickHouse Fails as a General-Purpose Logging Solution</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Wed, 20 May 2026 21:18:18 +0000</pubDate>
      <link>https://dev.to/bronto_io/why-clickhouse-fails-as-a-general-purpose-logging-solution-kh4</link>
      <guid>https://dev.to/bronto_io/why-clickhouse-fails-as-a-general-purpose-logging-solution-kh4</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Mike Neville-O'Neill&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's talk about ClickHouse. It's fast, it's efficient, it's open source. But it's designed to be a general-purpose analytical database, which makes it challenging — in terms of time and resources — to use for logging at scale.&lt;/p&gt;

&lt;p&gt;I've spent over a decade working on logging systems and watched countless organizations struggle with the same pattern: a team frustrated with expensive legacy log management turns to ClickHouse, excited by early performance wins on well-structured data. They rebuild critical logging infrastructure around it, roll it out, and everything seems great... until it doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Teams Try ClickHouse for Logs
&lt;/h2&gt;

&lt;p&gt;The appeal is undeniable. When you first point ClickHouse at a clean, structured dataset, the performance is genuinely impressive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — Built for analytical queries, can scan billions of rows in seconds on modest hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; — Open source, so licensing costs are zero&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Familiarity&lt;/strong&gt; — If your team works with SQL, the learning curve is minimal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt; — Complete control over your data, schema, and retention policies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Early benchmarks usually reinforce this perception. Testing on controlled, clean data with predictable patterns, ClickHouse shines. You might start with a specific subset of logs — API access logs or well-structured application events — and see impressive query speeds.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The problem isn't that ClickHouse doesn't work for logs. The problem is that it doesn't work for &lt;em&gt;all&lt;/em&gt; logs, all the time, at scale, without significant engineering investment. And that's where organizations get trapped.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Architectural Mismatch
&lt;/h2&gt;

&lt;p&gt;Logs in the wild are messy, inconsistent, and unpredictable — the direct opposite of what ClickHouse was designed to handle.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Columnar Conundrum
&lt;/h3&gt;

&lt;p&gt;ClickHouse's columnar storage engine is optimized for queries that scan large portions of specific columns — brilliant for analytics workloads with well-defined dimensions. But logs are fundamentally different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unpredictable schemas&lt;/strong&gt; — Log formats change; new fields appear and disappear without warning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nested structures&lt;/strong&gt; — Modern logs, especially from cloud environments, contain deeply nested JSON with inconsistent structures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High cardinality&lt;/strong&gt; — Fields like session IDs and request IDs have enormous numbers of unique values, which &lt;a href="https://www.atatus.com/blog/clickhouse-high-cardinality-optimization/" rel="noopener noreferrer"&gt;presents challenges for ClickHouse&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed query patterns&lt;/strong&gt; — Log analysis involves aggregation, filtering, full-text search, and pattern matching — not just the analytical queries ClickHouse excels at&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Forcing logs into ClickHouse is like racing a Formula 1 car on an off-road trail — impressive engineering, wrong application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema Rigidity vs. Log Reality
&lt;/h3&gt;

&lt;p&gt;In the real world, logs evolve constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers add new fields to debug specific issues&lt;/li&gt;
&lt;li&gt;Third-party systems change their log formats without notice&lt;/li&gt;
&lt;li&gt;Cloud providers modify their event structures&lt;/li&gt;
&lt;li&gt;Microservices introduce different logging conventions across teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With ClickHouse, schema changes require &lt;a href="https://clickhouse.com/blog/getting-data-into-clickhouse-part-2-json" rel="noopener noreferrer"&gt;careful planning&lt;/a&gt; to avoid additional overhead. A purpose-built logging system accommodates this reality by adapting to your logs — not the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing Limitations
&lt;/h3&gt;

&lt;p&gt;ClickHouse's primary indexing mechanism — the &lt;a href="https://clickhouse.com/docs/guides/best-practices/sparse-primary-indexes" rel="noopener noreferrer"&gt;sparse primary index&lt;/a&gt; — works by creating index marks every N rows (typically thousands). This excels for batch analytical queries but creates significant compromises for log search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Limited selectivity&lt;/strong&gt; — Works best when scanning large portions of the dataset, not when looking for specific events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suboptimal for high-cardinality fields&lt;/strong&gt; — User IDs, session IDs, trace IDs all &lt;a href="https://www.atatus.com/blog/clickhouse-high-cardinality-optimization/" rel="noopener noreferrer"&gt;struggle with this approach&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text search limitations&lt;/strong&gt; — Full-text search is functional but not optimized for the complex pattern matching common in log analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real-World Scaling Shatters the Illusion
&lt;/h2&gt;

&lt;p&gt;The true challenges emerge as your ClickHouse logging solution scales. What worked smoothly in development with gigabytes becomes increasingly problematic at terabyte or petabyte scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Performance Degrades
&lt;/h3&gt;

&lt;p&gt;Initially, queries return in seconds or less. As data grows, the same searches can take minutes. Without careful tuning and constant optimization, query performance becomes increasingly unpredictable — your team ends up spending more time optimizing queries than using them to solve problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Overhead Becomes Crushing
&lt;/h3&gt;

&lt;p&gt;Managing a ClickHouse cluster at scale is a specialized skill. You'll need to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://engineering.contentsquare.com/2022/scaling-out-clickhouse-cluster/" rel="noopener noreferrer"&gt;&lt;strong&gt;Rebalancing&lt;/strong&gt;&lt;/a&gt; — Manually triggering operations as data distribution becomes uneven&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://altinity.com/blog/clickhouse-mergetree-on-s3-administrative-best-practices" rel="noopener noreferrer"&gt;&lt;strong&gt;Storage management&lt;/strong&gt;&lt;/a&gt; — Careful tuning of merge settings, partitioning schemes, and TTL rules&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://edgedelta.com/company/blog/how-we-increased-the-clickhouse-query-performance-by-20x-with-an-application-layer-cache" rel="noopener noreferrer"&gt;&lt;strong&gt;Resource allocation&lt;/strong&gt;&lt;/a&gt; — Preventing resource-intensive queries from impacting overall system performance&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://posthog.com/handbook/engineering/clickhouse/data-storage" rel="noopener noreferrer"&gt;&lt;strong&gt;Schema evolution&lt;/strong&gt;&lt;/a&gt; — Adding, removing, or modifying columns requires careful planning&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.highlight.io/blog/how-we-built-logging-with-clickhouse" rel="noopener noreferrer"&gt;&lt;strong&gt;Multi-tenancy&lt;/strong&gt;&lt;/a&gt; — ClickHouse has limited built-in multi-tenancy, creating additional complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One large e-commerce team shared with us that they had six full-time engineers dedicated just to keeping their ClickHouse logging infrastructure operational — an enormous hidden cost that rarely factors into initial calculations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reliability Issues Emerge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion pipeline failures&lt;/strong&gt; — As log volumes spike, &lt;a href="https://www.mux.com/blog/latency-and-throughput-tradeoffs-of-clickhouse-kafka-table-engine#the-problem-with-only-measuring-throughput" rel="noopener noreferrer"&gt;ingestion often falls behind&lt;/a&gt; because ingestion and search aren't separated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex failure modes&lt;/strong&gt; — Troubleshooting ClickHouse problems requires deep database expertise most DevOps teams don't have&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovery complexity&lt;/strong&gt; — Restoring from failures or data corruption is time-consuming and error-prone&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The DIY Trap
&lt;/h2&gt;

&lt;p&gt;When engineers suggest "just use ClickHouse for logging," they're often underestimating what that actually means. ClickHouse isn't a logging platform — it's a powerful but complex analytical database that requires substantial expertise to turn into a production-ready observability solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  The ClickHouse Learning Curve
&lt;/h3&gt;

&lt;p&gt;ClickHouse has unique concepts that take months to develop expertise in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MergeTree engines&lt;/strong&gt; — Understanding when to use &lt;code&gt;ReplicatedMergeTree&lt;/code&gt; vs. &lt;code&gt;ReplacingMergeTree&lt;/code&gt; vs. dozens of other variants&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partitioning and primary key design&lt;/strong&gt; — Critical decisions that affect query performance and can't easily be changed later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Merge behavior tuning&lt;/strong&gt; — Configuring background processes that determine how your data gets organized and compressed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Building the Missing Pieces
&lt;/h3&gt;

&lt;p&gt;Even with ClickHouse expertise, you still need to build an entire logging platform around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingestion pipelines that handle backpressure, parse multiple log formats, and deal with schema evolution&lt;/li&gt;
&lt;li&gt;Query interfaces — APIs and UIs that let users actually search and analyze their logs&lt;/li&gt;
&lt;li&gt;Access control — multi-tenancy and permissions&lt;/li&gt;
&lt;li&gt;Alerting and monitoring systems&lt;/li&gt;
&lt;li&gt;Operational tooling — backup/recovery, cluster health monitoring, schema migrations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Hidden Costs
&lt;/h3&gt;

&lt;p&gt;What starts as "let's save money by using open source" often becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Months of engineering time just to get basic functionality working&lt;/li&gt;
&lt;li&gt;Ongoing operational overhead that scales with data volume&lt;/li&gt;
&lt;li&gt;Opportunity cost of engineers focusing on infrastructure instead of product features&lt;/li&gt;
&lt;li&gt;Risk of outages due to misconfiguration or lack of operational expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most teams, the total cost of ownership of a DIY ClickHouse solution exceeds purpose-built logging platforms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Built for Logs, Not Analytics
&lt;/h2&gt;

&lt;p&gt;Instead of adapting an analytics database for logging, the right approach is to build a system specifically for the messy, unpredictable nature of log data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Separation of compute and storage&lt;/strong&gt; — Decoupling storage from compute so each scales independently, keeping searches fast regardless of data volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexless architecture with Bloom filters&lt;/strong&gt; — Sub-second search performance without the overhead and complexity of traditional indexing, particularly effective for high-cardinality fields&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema-agnostic ingestion&lt;/strong&gt; — New fields, changing structures, and varying formats handled automatically without schema migrations or pipeline adjustments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized for search patterns&lt;/strong&gt; — Full-text search, pattern matching, filtering, and aggregation all handled natively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sub-second search on terabytes of data&lt;/li&gt;
&lt;li&gt;Seconds on petabytes — no rehydration from cold storage&lt;/li&gt;
&lt;li&gt;12-month retention by default&lt;/li&gt;
&lt;li&gt;Zero operational overhead&lt;/li&gt;
&lt;li&gt;Transparent, predictable pricing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;ClickHouse is an impressive technology for what it was designed to do: structured data analytics at scale. But turning it into a general-purpose logging solution requires enormous engineering investment and still results in fundamental compromises.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When you can instantly search across years of data, logs shift from a compliance checkbox or last-resort troubleshooting tool to an active part of your operational toolkit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of spending years rebuilding what already exists, consider whether your team's time is better spent solving your core business problems. The logging layer should be a foundation that just works — not a perpetual engineering project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See Bronto in Action&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
      <category>database</category>
    </item>
    <item>
      <title>Breaking Logging's Flywheel of Compromises</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Tue, 19 May 2026 18:26:09 +0000</pubDate>
      <link>https://dev.to/bronto_io/breaking-loggings-flywheel-of-compromises-5gmb</link>
      <guid>https://dev.to/bronto_io/breaking-loggings-flywheel-of-compromises-5gmb</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Mike Neville-O'Neill&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let's face it — logging is broken. Not just a little broken, but fundamentally misaligned with the needs of modern engineering teams. At a recent AWS Summit talk in London, Benoit Gaudin (our Head of Infrastructure) and I shared Bronto's vision for fixing this mess once and for all.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem We're All Living In
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcayts215d8de7o18qt84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcayts215d8de7o18qt84.png" alt="The 3C flywheel of compromises" width="800" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're running any significant infrastructure today, you're probably stuck in what we call the &lt;strong&gt;"3C flywheel of compromises"&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; — Logging at scale has become ridiculously expensive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage&lt;/strong&gt; — So you cut corners, dropping those infra logs and long-tail workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt; — And end up with a Frankenstein's monster of 5–8 different systems duct-taped together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't just inefficient — it's actively harmful. Engineers end up building parallel solutions just to get basic visibility because the main tool is too limited, too slow, or too expensive.&lt;/p&gt;




&lt;h2&gt;
  
  
  Logs Matter More Than Ever
&lt;/h2&gt;

&lt;p&gt;Logs aren't just a compliance checkbox anymore. They're your &lt;strong&gt;operational ground truth in the AI era&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;They feed your LLMs. They power your agents. They're your audit trail, your RAG source, your behavioral training set. And one log message from an LLM-based system might contain 50–100 nested events in a single payload.&lt;/p&gt;

&lt;p&gt;Try scaling that with a solution built before the separation of compute and storage was even a thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  How We're Breaking the Cycle
&lt;/h2&gt;

&lt;p&gt;Bronto was built to tackle this head-on with three non-negotiable capabilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Subsecond search on all logs&lt;/strong&gt; — whether they're two seconds or two years old&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Petabyte-scale retention&lt;/strong&gt; — no infrastructure for you to manage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completely different pricing&lt;/strong&gt; — think cents per GB, not dollars&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The platform is built natively on AWS (S3, Lambda, DynamoDB), but engineered so you don't have to deal with pipelines, pre-processing, or glue code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bronto's Architectural Advantage
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kjhzflbss411kbl3w3z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kjhzflbss411kbl3w3z.png" alt="Bronto architecture diagram" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The ingestion layer accepts data from standard sources — OpenTelemetry Collector, FluentD, FluentBit — through HTTP endpoints, with AWS EC2 load balancers doing the heavy lifting. Data is buffered through Kafka (AWS MSK), but then things diverge from the standard playbook.&lt;/p&gt;

&lt;p&gt;Instead of traditional approaches, data is processed from Kafka and written to S3 in a proprietary format that borrows techniques from data analytics: &lt;strong&gt;data partitioning, Bloom filtering, push predicates, compression, and columnar-based formats&lt;/strong&gt;. Metadata lives in DynamoDB for speed.&lt;/p&gt;

&lt;p&gt;The real magic happens at search time. When you query through the UI or API, Lambda functions launch in parallel and process data directly from S3. No overprovisioning for big queries — horizontal scaling on demand, paying only while functions run.&lt;/p&gt;

&lt;p&gt;This architecture is what enables both the performance (subsecond on terabytes, seconds on petabytes) and the pricing model. No expensive clusters running 24/7 — just cloud resources used exactly when and where they're needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Teams, Real Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  API-First Content Platform
&lt;/h3&gt;

&lt;p&gt;A team running a massive content delivery platform, serving APIs behind a global CDN for websites, mobile apps, and e-commerce systems. Every request hits their API with a unique key — they need to trace errors, group by status codes, and export logs to their own customers.&lt;/p&gt;

&lt;p&gt;
  Before Bronto
  &lt;ul&gt;
&lt;li&gt;40TB monthly ingestion cap&lt;/li&gt;
&lt;li&gt;30+ minute query times (when they worked at all)&lt;/li&gt;
&lt;li&gt;Dashboards that routinely failed&lt;/li&gt;
&lt;li&gt;Constant budget pressure
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;
  After Bronto
  &lt;ul&gt;
&lt;li&gt;Boosted ingestion to 60TB monthly&lt;/li&gt;
&lt;li&gt;Cut their logging bill in half&lt;/li&gt;
&lt;li&gt;Complex multi-day queries now return in subseconds&lt;/li&gt;
&lt;li&gt;Built reliable log exports for their own customers
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;Their exact words? &lt;em&gt;"Bronto changed our lives."&lt;/em&gt; A logging tool. Actually improving engineers' lives.&lt;/p&gt;
&lt;h3&gt;
  
  
  Global SaaS Project Management Platform
&lt;/h3&gt;

&lt;p&gt;A company running a suite of SaaS tools across distributed cloud services and product lines.&lt;/p&gt;

&lt;p&gt;
  Before Bronto
  &lt;ul&gt;
&lt;li&gt;Graylog for live logs&lt;/li&gt;
&lt;li&gt;S3 for long-term storage&lt;/li&gt;
&lt;li&gt;HAProxy logs dumped into S3 with gnarly Athena queries&lt;/li&gt;
&lt;li&gt;A mix of Athena, Superset, and QuickSight for analytics&lt;/li&gt;
&lt;li&gt;Just 1–2 days of retention across most systems
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;
  After Bronto
  &lt;ul&gt;
&lt;li&gt;Everything centralized — HAProxy, Kubernetes, application logs, audit trails&lt;/li&gt;
&lt;li&gt;Extended to 90-day hot retention&lt;/li&gt;
&lt;li&gt;Real dashboards tracking error spikes, traffic anomalies, and app version drift&lt;/li&gt;
&lt;li&gt;Engineers focused on product, not maintaining logging infrastructure
&lt;/li&gt;
&lt;/ul&gt;




&lt;/p&gt;
&lt;p&gt;They went from managing logs to actually using them.&lt;/p&gt;


&lt;h2&gt;
  
  
  Logs as Your Secret Weapon
&lt;/h2&gt;

&lt;p&gt;Your log data is massively undervalued — not because it lacks signal, but because current tooling hides that signal behind cost barriers, friction, and compromises.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Logs used to be a liability. With the right approach, they can be your secret weapon.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We're building Bronto to be for logging what Dyson was for vacuum cleaners, what iPhone was for smartphones, and what Tesla was for electric cars — a complete reimagining of what's possible when you refuse to accept the status quo.&lt;/p&gt;

&lt;p&gt;After all, when was the last time your logging tool made your life better instead of worse?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See Bronto in Action&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The CDN Logging Crisis</title>
      <dc:creator>Patrick Londa</dc:creator>
      <pubDate>Tue, 19 May 2026 17:39:56 +0000</pubDate>
      <link>https://dev.to/bronto_io/the-cdn-logging-crisis-3d1g</link>
      <guid>https://dev.to/bronto_io/the-cdn-logging-crisis-3d1g</guid>
      <description>&lt;p&gt;&lt;em&gt;Authored by Benoit Gaudin&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every second, your CDN is generating thousands of logs that tell a critical story about your application's performance, security, and user experience. For large enterprises, this can mean terabytes of log data every day — data that contains invaluable insights about your business.&lt;/p&gt;

&lt;p&gt;But here's the uncomfortable truth: most organizations capture only a small fraction of their CDN logs, and retain that limited data for just days or weeks. This isn't because engineering teams don't understand the value. It's because the economics of traditional logging solutions make comprehensive CDN logging prohibitively expensive.&lt;/p&gt;

&lt;p&gt;The result? Critical blind spots that can be extremely costly during outages, security breaches, or major events.&lt;/p&gt;

&lt;p&gt;Welcome to the &lt;strong&gt;flywheel of compromises&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; — Traditional logging vendors charge egregious per-GB rates that make comprehensive CDN logging unaffordable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage&lt;/strong&gt; — Companies respond by severely limiting what logs they collect and how long they retain them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt; — To compensate for coverage gaps, teams cobble together 5–8 different logging solutions, creating a management nightmare&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Current State of CDN Logging
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F6835cf3bc14ffe346a1da51c_image%2520%282%29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.prod.website-files.com%2F67d2e1c8bd118640c72006cb%2F6835cf3bc14ffe346a1da51c_image%2520%282%29.png" alt="CDN logging landscape" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The observability sector today resembles markets before transformative innovation — vacuum cleaners before Dyson, mobile phones before iPhone, electric cars before Tesla. Existing solutions were designed for a completely different era: before the separation of compute and storage, before the explosion of log data volumes, and certainly before the demands of the AI era.&lt;/p&gt;

&lt;p&gt;Consider how most logging vendors operate today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Datadog&lt;/strong&gt; charges around $2–5 per GB for log ingestion with 15-day retention. A company generating 10TB of CDN logs daily could pay upwards of $600,000 per month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Splunk&lt;/strong&gt; forces customers into complex licensing schemes that effectively limit how much data they can realistically log&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New Relic&lt;/strong&gt; and other vendors offer marginally better pricing but still force unacceptable trade-offs between cost and coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's most frustrating is that these pricing models persist despite dramatic changes in the underlying technology. The separation of compute and storage has revolutionized data economics across virtually every other category of software, yet logging vendors continue to operate on business models created 15 years ago.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Hypothetical (But Entirely Plausible) Scenario
&lt;/h2&gt;

&lt;p&gt;To illustrate the real-world impact of incomplete CDN logging, consider this:&lt;/p&gt;

&lt;p&gt;A week before a major live streaming event, a provider's engineering team makes a routine CDN configuration change. Under normal traffic loads, the misconfiguration goes unnoticed — cache hit ratios remain stable and performance appears normal.&lt;/p&gt;

&lt;p&gt;After a week, any trace of the configuration change disappears from their logs due to their 7-day retention policy. Capacity planning teams review infrastructure and assume current backend capacity can handle the anticipated load — after all, it worked fine during the last similar event. Unfortunately, the now-invisible change makes that assumption dangerously wrong.&lt;/p&gt;

&lt;p&gt;During the live event, CDN cache efficiency plummets under heavy load. Backend servers get hit much harder than expected. Users experience buffering and connection problems, but the operations team struggles to diagnose the root cause.&lt;/p&gt;

&lt;p&gt;By the time they identify the issue — tracing it back to the forgotten configuration change — the damage is done. Over a million viewers have abandoned the stream, social media is flooded with complaints, and the company's stock takes a hit.&lt;/p&gt;

&lt;p&gt;With complete CDN logging and longer retention, they could have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identified when the degradation trend first appeared, correlating it to the configuration change&lt;/li&gt;
&lt;li&gt;Maintained visibility throughout the planning period&lt;/li&gt;
&lt;li&gt;Quickly correlated the performance issues with the earlier change during the incident&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Limited logging coverage transformed a minor configuration error into a major business incident. The cost of their logging "savings"? Potentially millions in lost ad revenue and subscription cancellations.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Horsemen of the Logging Apocalypse
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cost Explosion
&lt;/h3&gt;

&lt;p&gt;Traditional logging vendors price their products based on data volume, charging premium rates for both ingestion and storage. This pricing model was created when storage was genuinely expensive. In 2025, with cloud storage costs continuing to plummet, this model serves primarily to protect vendor margins.&lt;/p&gt;

&lt;p&gt;For CDN logs — which are high-volume by nature — this creates an impossible equation. When faced with estimates of $500,000+ monthly for complete CDN logging, even the most data-driven organizations are forced to compromise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Coverage Sacrifice
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf8zwflo6l28735jf7c4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf8zwflo6l28735jf7c4.png" alt="Coverage gaps diagram" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The inevitable result of cost pressure is reduced coverage. Organizations typically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingest only a sample of the data&lt;/li&gt;
&lt;li&gt;Limit retention to days instead of months&lt;/li&gt;
&lt;li&gt;Exclude high-volume CDNs or regions entirely&lt;/li&gt;
&lt;li&gt;Drop detailed fields that would aid troubleshooting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These compromises create dangerous blind spots. Intermittent issues, security threats that develop over time, and regional performance problems remain invisible. When an incident occurs, teams often discover they're missing exactly the data they need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complexity Creep
&lt;/h3&gt;

&lt;p&gt;To compensate for coverage limitations, organizations implement a patchwork of supplementary solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-hosted ELK stacks for longer-term storage (with all the maintenance overhead)&lt;/li&gt;
&lt;li&gt;Cloud provider-specific logging solutions (AWS CloudWatch, GCP Logging)&lt;/li&gt;
&lt;li&gt;Custom scripts to archive logs to object storage with rehydration workflows&lt;/li&gt;
&lt;li&gt;Open-source tools for log analysis and visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a Frankenstein's monster of logging infrastructure that no one fully understands, requires constant maintenance, and still fails to provide comprehensive visibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  CDN Logging for the AI Era
&lt;/h2&gt;

&lt;p&gt;These challenges are escalating as we enter the AI era:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exploding volumes&lt;/strong&gt; — Microservices, containers, and edge computing are all contributing to the data deluge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-powered analysis&lt;/strong&gt; — ML systems require comprehensive, long-term data to identify patterns and anomalies effectively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic applications&lt;/strong&gt; — Autonomous applications require complete historical data to make intelligent decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Legacy logging business models simply cannot accommodate these realities. They weren't designed for terabytes of daily log ingestion, years of retention, or a world where AI agents might need to analyze months of historical CDN patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Different Approach
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3935r3z66gcmnxtqh6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3935r3z66gcmnxtqh6a.png" alt="Bronto CDN logging dashboard" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Solving the CDN logging crisis requires rebuilding the logging stack from the ground up — not incremental improvements on broken foundations. Three core principles drive the right approach:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Economics Aligned with Modern Infrastructure
&lt;/h3&gt;

&lt;p&gt;Leveraging the separation of compute and storage to deliver CDN logging at a fraction of traditional costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;90% cost reduction&lt;/strong&gt; compared to Datadog and similar vendors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12-month retention&lt;/strong&gt; by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No charges for search&lt;/strong&gt; or compute resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Lightning-Fast Search Across Petabytes
&lt;/h3&gt;

&lt;p&gt;"Tracey's Law": the faster you make log search, the more valuable logging becomes to an organization.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sub-second search across terabytes of CDN logs&lt;/li&gt;
&lt;li&gt;Seconds-long queries across petabytes&lt;/li&gt;
&lt;li&gt;No rehydration from cold storage, ever&lt;/li&gt;
&lt;li&gt;Fast dashboards even across months of data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When queries return in seconds instead of minutes (or timing out entirely), teams use logging data proactively rather than as a last resort.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A Single Unified Logging Layer
&lt;/h3&gt;

&lt;p&gt;Eliminating the patchwork by providing one comprehensive logging layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All CDN providers in one place&lt;/li&gt;
&lt;li&gt;Drop-in replacement for existing solutions&lt;/li&gt;
&lt;li&gt;Two-line configuration change for implementation&lt;/li&gt;
&lt;li&gt;Automatic parsing and PII removal&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Breaking Free from the Flywheel
&lt;/h2&gt;

&lt;p&gt;The CDN logging crisis isn't just a technical problem — it's a business problem with real implications for reliability, security, and user experience. For too long, organizations have accepted a dysfunctional status quo because there seemed to be no alternative.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Every single word about the logging crisis resonates. We were spending over $400,000 monthly on CDN logging with Datadog, and still only capturing about 20% of our logs. With Bronto, we now have 100% coverage, 12-month retention, and our bill is under $40,000."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't an incremental improvement — it's a fundamental reinvention of how logging works. Just as Apple reinvented the smartphone, Dyson reinvented the vacuum cleaner, and Tesla reinvented the electric car, the logging industry is overdue for the same transformation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Bronto is reinventing logging from the ground up for the AI era. The team brings 150+ years of collective logging domain expertise, with previous experience building and scaling logging platforms at IBM, Rapid7, and Logentries.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bronto.io/book-a-demo" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;See What 100% CDN Log Coverage Looks Like&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
      <category>cdn</category>
    </item>
  </channel>
</rss>
