Stop Streaming Blindly: Architectural Patterns for Cost-Optimized AI Logging

Kulaja Kithsahan — Sun, 31 May 2026 19:09:40 +0000

Hey everyone! 👋

A lot of modern AI tutorials teach you how to connect an API to a data source and call it a day. But in a production environment—especially in cybersecurity, that approach can break your budget. If a bot script attacks your open server port trying hundreds of passwords a minute, making an individual cloud LLM call for every single log line is a recipe for an astronomical API bill.

I recently designed an open-source AI-SOC Command Center specifically to solve this problem using a Micro-Batching Ingestion Buffer pattern. Here is the architectural breakdown of how I kept cloud costs low while maintaining real-time intelligence.

The Core Architecture

The system is built as a modular pipeline to ensure that local compute handles the noise, leaving the cloud LLM to handle the high-level analysis.

1. The Non-Blocking Stream Pointer

To watch logs continuously without freezing the user interface, the system utilizes a lightweight Python generator stream. It checks for new lines dynamically and immediately releases control back to the system stopwatch if the file is idle.

2. Volatile State Aggregation

Instead of talking to the internet immediately, incoming logs are captured inside an intermediate state bucket during 15-second sliding window. Once the window closes, Pandas compiles the raw rows into a structured format, collapsing thousands of repetitive hits down to a single unique IP signature:

# Aggregating user targets to group metrics locally before API dispatch
if ip not in aggregated_data:
    aggregated_data[ip] = {
        "ip_address": ip,
        "total_failed_attempts": 0,
        "targeted_usernames": set() # Deduplicates usernames targeted during the burst
    }

Conditional API Dispatch (The Guardrail)

The system evaluates the aggregated statistics against a customizable threshold controller. If an IP only logs a single casual failure, it is safely dismissed locally. If it breaks the threshold, the entire micro-batch summary is compiled into a single unified payload and sent to the Gemini API.

Result: 100 automated attacks become exactly 1 API request.

Building a Command Center Web UI

To make this data scannable at a glance, I paired the backend modules with a high-contrast dark theme frontend powered by Streamlit and Plotly Express.

Instead of basic rendering, the dashboard features:

Interactive Donut Charts: Tracking server user account distribution dynamically.
Horizontal Bar Charts: Visualizing active attack vectors and highlighting severe IPs with automated red heat-mapping scale shifts.
Asynchronous UI Refreshes: Utilizing st.rerun() loops to fetch log adjustments without interrupting the analytics viewports.
Collapsible Security Accordions: Organizing markdown forensic intelligence generated by the AI model cleanly.

Clean, Extensible Code

I structured this repository following professional design standards—keeping modules completely isolated, utilizing an init.py packaging structure, and safeguarding local API keys using rigid local configuration rules.

If you are looking at ways to optimize your data pipelines or integrate LLMs into high-frequency environments without breaking the bank, feel free to dive into the codebase!

👉 GitHub Repository: https://github.com/kulajakithsahan36/ai-soc-analyst.git

Let me know your thoughts on this micro-batching pattern or how you optimize your own AI pipeline thresholds in the comments!

How I Built a Local AI-Powered Combined Maths Solver for My A/L Preparation

Kulaja Kithsahan — Fri, 29 May 2026 16:56:26 +0000

Hey everyone! 👋

I am a student from Sri Lanka currently preparing to step into the highly intense G.C.E. Advanced Level (A/L) Combined Mathematics stream. Balancing school studies with a passion for programming can be tricky, so I decided to bridge the two worlds.

Instead of just solving math problems on paper, I built a local web-based Combined Mathematics AI Problem Solver tailored to our local syllabus.

Because this is my very first technical article, I want to pull back the curtain on how I designed the architecture, how the components talk to each other, and why I built it locally on my PC.

The Vision : Aligning AI with a Local Syllabus

Standard AI models are great at general math, but local examinations like the Sri Lankan A/Ls require steps to be structured in a very specific way to match local marking schemes.

My goal was to create a tool where a student could input a problem, and the application would return a step-by-step breakdown that feels familiar to an A/L student. To make it highly responsive and optimized, I didn't want it relying entirely on slow, repetitive cloud API calls every single time.

Here is the high-level map of how data moves through the system:

The Tech Stack Architecture

Instead of writing one massive, tangled file, I broke the project down into three distinct, manageable layers:

1. The Core Backend: FastAPI (Python)

I chose FastAPI to build the engine of this platform. It is lightweight, incredibly fast, and handles asynchronous requests brilliantly. FastAPI acts as the traffic cop it accepts the question from the frontend, packages it securely with specific system instructions, and sends it out.

2. The Brain: Gemini Pro via Google AI Studio

For the heavy mathematical reasoning, I hooked the backend up to cloud based Large Language Models using the Google GenAI SDK. To ensure the output matches the local syllabus requirements, the backend injects context like syllabus guidelines and past paper structures directly into the prompt before it hits the AI.

3. The Memory: SQLite Caching

This is my favorite part of the build. When you are studying, you often re-run or review the exact same math problems. Making an external API call to a cloud model every single time takes seconds and wastes network bandwidth.

To solve this, I wired up an SQLite database to handle response caching:

When a question comes in, FastAPI checks SQLite first.
Cache Hit: If the exact problem has been solved before, it pulls the answer instantly from the local database.
Cache Miss: If it's a completely new question, it calls Gemini, saves the fresh solution into SQLite for next time, and returns the answer.

Challenges I Faced (and What I Learned)

As a student developer, building this was a massive learning curve.

Decoupling Logic: At first, trying to make the web framework, the database queries, and the API requests handle asynchronous paths together felt overwhelming.
The "Mysterious Code" Phenomenon: Looking back at my scripts a few weeks later, I realized how quickly code can become complex! Even when your own syntax starts looking a bit like a foreign language to you, understanding the structural design how data goes in and comes out of your modules is what keeps the project alive.

The Journey is the Project

Building this tool taught me that you don't need a massive team or a cloud budget to create something highly functional. By combining a fast Python backend like FastAPI with local SQLite caching, a single developer can build responsive, AI-powered applications right on their own local machine.

This project is just the beginning for me. While my immediate focus is handling my upcoming academic streams, I plan to keep this local platform as my private playground to test new optimization techniques, learn more about database management, and sharpen my Python skills.

If you are a student developer building tools to solve your own everyday problems, or if you have any tips on backend optimization, I’d love to hear your thoughts in the comments below! 🚀

DEV Community: Kulaja Kithsahan