Stop Streaming Blindly: Architectural Patterns for Cost-Optimized AI Logging

#python #learning #security #opensource

Hey everyone! 👋

A lot of modern AI tutorials teach you how to connect an API to a data source and call it a day. But in a production environment—especially in cybersecurity, that approach can break your budget. If a bot script attacks your open server port trying hundreds of passwords a minute, making an individual cloud LLM call for every single log line is a recipe for an astronomical API bill.

I recently designed an open-source AI-SOC Command Center specifically to solve this problem using a Micro-Batching Ingestion Buffer pattern. Here is the architectural breakdown of how I kept cloud costs low while maintaining real-time intelligence.

The Core Architecture

The system is built as a modular pipeline to ensure that local compute handles the noise, leaving the cloud LLM to handle the high-level analysis.

1. The Non-Blocking Stream Pointer

To watch logs continuously without freezing the user interface, the system utilizes a lightweight Python generator stream. It checks for new lines dynamically and immediately releases control back to the system stopwatch if the file is idle.

2. Volatile State Aggregation

Instead of talking to the internet immediately, incoming logs are captured inside an intermediate state bucket during 15-second sliding window. Once the window closes, Pandas compiles the raw rows into a structured format, collapsing thousands of repetitive hits down to a single unique IP signature:

# Aggregating user targets to group metrics locally before API dispatch
if ip not in aggregated_data:
    aggregated_data[ip] = {
        "ip_address": ip,
        "total_failed_attempts": 0,
        "targeted_usernames": set() # Deduplicates usernames targeted during the burst
    }

Conditional API Dispatch (The Guardrail)

The system evaluates the aggregated statistics against a customizable threshold controller. If an IP only logs a single casual failure, it is safely dismissed locally. If it breaks the threshold, the entire micro-batch summary is compiled into a single unified payload and sent to the Gemini API.

Result: 100 automated attacks become exactly 1 API request.

Building a Command Center Web UI

To make this data scannable at a glance, I paired the backend modules with a high-contrast dark theme frontend powered by Streamlit and Plotly Express.

Instead of basic rendering, the dashboard features:

Interactive Donut Charts: Tracking server user account distribution dynamically.
Horizontal Bar Charts: Visualizing active attack vectors and highlighting severe IPs with automated red heat-mapping scale shifts.
Asynchronous UI Refreshes: Utilizing st.rerun() loops to fetch log adjustments without interrupting the analytics viewports.
Collapsible Security Accordions: Organizing markdown forensic intelligence generated by the AI model cleanly.

Clean, Extensible Code

I structured this repository following professional design standards—keeping modules completely isolated, utilizing an init.py packaging structure, and safeguarding local API keys using rigid local configuration rules.

If you are looking at ways to optimize your data pipelines or integrate LLMs into high-frequency environments without breaking the bank, feel free to dive into the codebase!

👉 GitHub Repository: https://github.com/kulajakithsahan36/ai-soc-analyst.git

Let me know your thoughts on this micro-batching pattern or how you optimize your own AI pipeline thresholds in the comments!