Day 64: Curing an AI's "Amnesia" with Serverless Context Truncation 🧠💸

#aws #python #ai #serverless

Building a Serverless Financial Agent on AWS has been an incredible journey, but today I encountered a classic trap in Generative AI: The Blind Prompt.

My AWS Bedrock agent confidently told me I spent 0€ this month, despite my DynamoDB table holding over 2,000€ in recorded expenses.

The Root Cause: Over-Aggressive FinOps

To keep my cloud bill near zero, my AWS Lambda function was only injecting today's transactions into the AI prompt. The logic was: fewer tokens = lower cost.
However, if I asked the AI about my monthly spending on a day where I hadn't bought anything yet, the array was empty ([]). The AI wasn't hallucinating; it was making a perfectly logical deduction based on the restricted context I gave it.

The Fix: Modularization & Context Truncation

I couldn't just dump the entire database history into the prompt; that would ruin my FinOps strategy and spike my token usage.

First, I had to pay down technical debt. I decoupled my massive lambda_function.py "God File" into a proper Router/Service pattern with 6 distinct modules (ai_engine.py, plaid_client.py, scoring.py, etc.).

Then, inside my new ai_engine.py, I implemented a Context Truncation pattern for chat queries:

Python

MODO CHAT — Context Truncation Fix

if user_query:
monthly_expense_summary = []

if monthly_txs:
    for t in monthly_txs[:60]:  # Read a safe buffer
        amt  = float(t.get('amount', 0))
        desc = t.get('description', '')
        if not is_income_tx(desc, amt):
            monthly_expense_summary.append(f"- {desc}: {abs(amt):.2f} EUR")

    # The FinOps Magic: Truncate to protect the AWS Bill
    monthly_expense_summary = monthly_expense_summary[:30]

Why this architecture works:

Accuracy: By passing monthly_txs, the AI gets the context it needs to answer questions like "What was my biggest expense?" accurately.

Cost Control: By explicitly slicing the array in Python ([:30]), I enforce a hard limit on my AWS Bedrock Input Tokens. The AI gets the top 30 relevant items, which is plenty for financial advice, without massive payload bloat.

Clean code doesn't just make your app maintainable; it makes your AI cheaper and smarter. 🚀