From Billing Alerts to Autonomous Action: Building SpendPilot with Qwen Cloud.

Mikhail Ikpoma — Sun, 14 Jun 2026 12:24:55 +0000

Introduction
Cloud waste is a silent budget killer. Industry reports estimate that companies waste up to 30% of their cloud spend on idle resources, misconfigurations, and forgotten instances. For FinOps and SRE teams, this means drowning in a sea of billing alerts, manually cross-referencing CloudMonitor metrics, and writing fragile bash scripts to clean up the mess.
For the recent Qwen Cloud Hackathon, my team and I asked a simple question: What if an AI agent could handle this entire workflow end-to-end, safely and autonomously?
The answer is SpendPilot, an autonomous FinOps agent that detects billing anomalies, diagnoses root causes, and executes cost-saving optimizations. Here is the story of how we built it, the challenges we faced, and why Qwen Cloud was the secret sauce that made it possible.

🏗️ The Architecture: More Than Just a Chatbot
We knew early on that a simple linear prompt chain wouldn’t cut it. Real-world cloud operations require state management, tool calling, and the ability to loop until a problem is solved.
We chose LangGraph for orchestration, paired with a FastAPI backend deployed on Alibaba Cloud ECS. But the true brain of the operation is powered by Qwen Cloud via the DashScope API.
Our architecture follows a clear, safe loop:
Ingest: A billing alert webhook triggers the agent.
Reason: Qwen-Max analyzes the alert and decides which Alibaba Cloud SDK tools to call (e.g., query_billing_anomaly, get_resource_metrics).
Act: If the agent identifies a severely underutilized ECS instance, it proposes an optimization (like stopping the instance).
Guardrail: If the action is destructive, the agent pauses for Human-in-the-Loop (HITL) approval via Slack.

🧠 The "Aha!" Moment: Giving the Agent Persistent Memory
Early in development, we hit a major wall: Context Amnesia.
If the agent successfully diagnosed and fixed an idle RDS database on Tuesday, it would completely forget that solution when the same alert fired on Thursday. It was wasting tokens and time re-learning the same problem.
This is where Qwen’s text-embedding-v3 model saved the day.
We implemented a "MemoryAgent" crossover architecture. Every time SpendPilot resolves an incident, it summarizes the event and stores it in a local FAISS vector database using Qwen embeddings. The next time an alert fires, the agent performs a similarity search before making any tool calls.
Here is a snippet of how we integrated Qwen Embeddings into our memory module:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

Initialize Qwen Embeddings via DashScope OpenAI-compatible endpoint

embeddings = OpenAIEmbeddings(
model="text-embedding-v3",
openai_api_key=os.getenv("DASHSCOPE_API_KEY"),
openai_api_base="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

def get_past_incidents(query: str) -> str:
db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
docs = db.similarity_search(query, k=2)
return "\n".join([doc.page_content for doc in docs])

Watching the agent instantly retrieve a past fix and skip the diagnostic phase entirely was a massive breakthrough. It transformed SpendPilot from a reactive tool into a learning system.

🛡️ The Safety Challenge: Human-in-the-Loop (HITL)
Building an agent that can execute alibabacloud_ecs SDK commands to stop production servers is inherently risky. We quickly realized that full autonomy without guardrails is a non-starter for enterprise adoption.
We solved this by teaching Qwen-Max to recognize intent. We added a strict rule to our system prompt: If you need human approval to execute a destructive action, output exactly 'APPROVAL_REQUIRED: [action]'.
When the LangGraph state machine detects this string, it halts execution and routes the request to a Slack approval workflow. This hybrid approach gives us the speed of AI with the safety of human oversight.

🚀 Why Qwen Cloud?
We evaluated several LLM providers, but Qwen Cloud stood out for three reasons:
Exceptional Tool-Calling Accuracy: Qwen-Max consistently formatted our JSON tool calls for the Alibaba Cloud SDKs without hallucinating parameters.
were crucial for building the memory feature without blowing up our token budget.
Seamless Alibaba Cloud Ecosystem: Using DashScope alongside native Alibaba Cloud SDKs (BSS, CloudMonitor, ECS) created a frictionless, high-performance pipeline.

🔮 What’s Next for SpendPilot?
While our hackathon prototype is fully functional, our roadmap is just getting started:
Migrating our local FAISS memory to Alibaba Cloud AnalyticDB for PostgreSQL for scalable, distributed memory.
Expanding our toolset to include automated lifecycle management for OSS and RDS.
Deepening integrations with enterprise ITSM platforms like Jira and PagerDuty.

🙌 Final Thoughts
Building SpendPilot was a crash course in stateful AI orchestration, cloud infrastructure, and the importance of safety guardrails. Qwen Cloud provided the robust reasoning and embedding capabilities we needed to turn a complex FinOps nightmare into a streamlined, autonomous workflow.

Want to see it in action or contribute?
🔗 Check out the source code on GitHub: [https://github.com/validivar/spendPilot]
🔗 View our Devpost Submission: [https://devpost.com/software/spendpilot?ref_content=user-portfolio&ref_feature=in_progress]

Built with ❤️ for the Qwen Cloud Hackathon.

DEV Community: Mikhail Ikpoma

From Billing Alerts to Autonomous Action: Building SpendPilot with Qwen Cloud.

Initialize Qwen Embeddings via DashScope OpenAI-compatible endpoint