Gábor Mészáros

Posted on Jan 14 • Edited on Feb 4

From Zero to Agentic Platform Building: The SageCompass Origin Story (series prologue)

#ai #llm #langgraph #softwareengineering

The story of how a small prompt idea snowballed into a real agentic LangGraph runtime and I didn’t even speak Python three months ago.

This is the first post in the series, the "why/how did you even end up building this" part. The next posts will be technical deep dives into specific topics (project architecture, RAG with Drupal, contracts/guardrails, ambiguity detection, etc).

TL;DR

I attended AW-MLEA (Machine Learning Engineering on AWS) which ignited my professional enthusiasm.
I needed a real project to learn. Luckily, the course gave an easy-looking idea: an "ML success criteria framework". I named it SageCompass.
What I thought would be "a small practice thing" quickly turned into something that taught me how to think with these technologies.
After a few months I found myself rebuilding parts of the runtime again and again, mostly because the code kept exposing what was unclear or brittle.
This series is me documenting the path and the architecture lessons as they formed (no prior training or experience in Python, ML, or agentic systems so things weren't obvious from day one).
Current implementation is here: https://github.com/cleverhoods/sagecompass

It was late October, 2025.

I was waiting excitedly for my long, 4-day weekend after a long, 6-day working week (... yeah, in Hungary you have to "work off" the extra holiday). I was so looking forward to it, that I completely forgot that I'd signed up for AW-MLEA (Machine Learning Engineering on AWS) training. The training started on Wednesday. My long weekend started on Thursday. The training was 3 days long.

... great.

Little did I know that my previous weekend was my last free one for the forthcoming months.

The training kicked off, and just a few hours in, I was hooked.

Finally! High-grade, educational information about Machine Learning! How it learns, what's the difference between ML training approaches, what challenges are there for ML solutions, what is a Model, what type of Models are there and what are the main differences between them, what kind of dimensions responsible AI encompasses, what kind of biases are there, how does an ML success criteria framework look like and so on and on.

... and we haven't even arrived at lunch and already had 2 coffee breaks.

This went on for 2 (and a half) more days. With similar intensity and information density. It was only manageable thanks to the great learning material and the great guidance (a shout-out to Thomas Fruin, the exceptional presenter of this course).

... and just like that, it was over. The course was completed and I was left with a burning enthusiasm to utilize the freshly learned concepts.

I could've played around with sandbox environments (very limited instances of the SageMaker Studio with Jupyter notebooks) but that wasn't what I was looking for.

I'm adamant about learning new concepts: the easiest way to learn is by doing. Implement a real project. Don't just poke at sandboxes.

All I needed was an example project, and I already had my eyes on something I was introduced to on the very first day: an ML success criteria framework.

It got the name SageCompass because it's supposed to help me determine whether the input is actually an ML problem.

How SageCompass evolved

The original idea was simple: take the ML success criteria framework, turn it into a guided intake flow, and provide structured, detailed answer to the core question:

"Is this even an ML problem?"

And if it is, make sure the answer includes measurable business values:

Business Goals
KPIs
Data baselines and readiness
Risk assessment
Minimal pilot implementation plan with a kill criteria if things go in the wrong direction.

At the time I thought: "I'll build a small example project, practice some concepts on the business values and move on."

we do this not because it is easy, but because we thought it would be easy — Yep.

Instead it turned into a loop: build -> surprise -> restructure -> repeat. Over and over again.

v1-3: The Prompt era

Originally I did not plan to build an agentic platform. Not even a system. Just one well structured prompt, with deterministic rules, and a structured output.

The original prompt quickly turned to several specific prompts as I've kept hitting ChatGPT's upper prompt length limit.

Behavioral expectations, process and task rules, policies, limitations, etc., all got into dedicated markdown files.

A single prompt turned into a small prompt system.

Eventually I ended up with 9 different prompt segment files + the main instructions:

At this stage the project was only "prompt engineering with ambition" with wow factor. Whilst the prompts and the outputs reflected what I had just learned about models and how they process information, maintaining and extending them was a nightmare. I had to be on guard all the time when I changed something and had to be sure that change is followed up everywhere. The solution was rough and definitely not production-useful.

I needed some help validating the direction, so I asked our AI Director (Sebastiaan den Boer) for a sanity check, and his first reaction was: "This sounds like a LangChain project."

My first reaction to that was: "Sounds like a what now?"

So I set out to learn what LangChain is and how to implement my project with it.

v4: My first ever Python project

LangChain offers 2 types of implementation: JavaScript and Python. I always wanted to learn Python and during the training we heavily relied on Jupyter notebooks so I felt that NOW is the right time to get this off my bucket list.

For this phase I set one simple goal: a running Python environment with at least one Agent.

My expectations for the project were the same as with any other projects:

it should be encapsulated
it should provide its own runtime

In the first iterations, I tried to implement the Python local development environment with DDEV (which is a great, project-level Docker environment manager/orchestrator, using it for years now for Drupal projects).

When you have a hammer, everything is a nail.

It worked well enough to create a super basic LangChain implementation, that would read and validate LLM provider (OpenAI and Perplexity) configurations, log what's happening in the system, read my humongous prompt and communicate with Gradio UI.

Wow! That was easy!

I was able to quickly create a base Python class for all the future Agents and to create the very first agent: Problem Framing. It was the first phase in the SageCompass Process Model (v3), from my reasoning-flow.md. Its singular job was to reframe the user input into a rich business format so I could set expectations for the input data before I ran the whole shebang.

All dandy. I hit the phase goals, so I gave myself permission to ignore the mysteriously slow response time in the UI when I ran a query.

At least until I add more agents and this thing starts fighting back.

v5: Journey to a multi-agent workflow

At the end of v4 I had exactly what I wanted: a working Python runtime and my first Agent.

I also had something else: a UI that was oddly slow.

So I did the obvious thing: ignored the problem and started drafting more Agents anyway.

I took the rest of the SageCompass Process Model (v3), from my reasoning-flow.md and split it into Agents.

Agent	Responsibility	Output
Problem Framing	Reframe input into a structured brief	`brief`
Business Goal	Produce SMART business goal(s)	`goals[]`
KPI	Define KPI set with thresholds	`kpis[]`
Eligibility	Decide AI/ML eligibility (yes/no + why)	`eligibility{decision, reasons}`
Solution Design	Propose candidate approaches (high level)	`approaches[]`
Cost Estimation	Estimate cost/effort envelope	`cost_envelope`
Decision	Issue summary with final recommendation	`recommendation`

My approach was simple:

every Agent should be responsible for one thing
every Agent should have the same folder structure
every Agent should be built similarly

This was enabled by the base Python class implementation from v4. Each Agent expected to extend that class.

Learned some Python, yay

Put it all together and this is what the workflow looked like.

workflow pipeline with eligibility gate and Non-AI branch

Looks neat, isn't it? Let's run it!

drum rolls
more drum rolls
even more drum rolls

... wow. That took 7 minutes 32 seconds to run. Feature-first tunnel vision sure bites hard.

So I stopped pretending this was fine and started cleaning house. First move: I switched the project's packaging and workflow to uv and ditched the earlier setup. Then I found the real culprit: I was running another virtual Python environment inside Docker.

After fixing that, runtime dropped to ~1 minute per run.

Lesson learned: don't stack Python environments.

And while building the workflow, a sudden realization hit me:

If I'm building an augmented decision system that checks whether an idea needs ML at all... wouldn't that naturally translate to whether an idea needs AI at all?

If that's true, then the original "ML success criteria" flow doesn't disappear.

It becomes a reusable component I would have to build anyway.

The difference is: I'd end up with something way more useful.

So I took the longer road.

And that's where the prompt stopped being "the thing" and everything around it started becoming the thing.

v6: In for a penny, in for a pound - creating an agentic platform

I liked the idea.

How would it roughly look like? Gonna need RAG for sure.

supervisor graph orchestrating subgraphs + RAG node + outputs/reporting boundaries

It also would be nice if data would come from a curated source, like a Drupal CMS for example.

And that's where SageCompass started forcing architectural decisions.

Drupal needed its own place. LangGraph needed its own place. And if I wanted to do this properly, the UI had to be separated from LangGraph (Gradio UI was still living under the same Python project).

So I reorganized the repo.

And that was when a core issue was immediately surfaced:

There were no rules.

For example: there was no project-wide rule for referencing file locations, because in the early prototype I could take shortcuts and everything still worked.

... Shortcuts are always the longer road in the long run.

And it wasn't just paths.

There were no solid architectural rules for Agents, Nodes, middleware, schemas.

No rules for state management. Prompt management. Tool management.

It was all still ... duct tape and hope.

It gave me back that late v3 feeling, where every change was a nightmare. Touch one thing, five other things break, and suddenly I'm debugging the project instead of building it.

So I decided to slow down and get into control.

The goal of v6 became crystal clear:

make the existing implementation smaller

lock down the architecture so I can add LangGraph components without surprises

add quality assurances so the system behaves deterministically, even though it's an LLM-based augmented decision system

SageCompass v6 is currently here. It only has one of the v5 Agents (Problem Framing), but it has the stuff that makes a platform a platform:

Rule/Contract	Description
Enforced architectural contracts	Platform contract map + tests that fail if you break it
Typed models and state	Pydantic everywhere, so the runtime has something solid to hold onto
Reusable subgraphs and nodes	Clear phase boundaries instead of “giant pipeline code”
Guardrails and deterministic tool policies	Allowlists, bounded behavior, no surprises
Ambiguity detection + bounded clarification loops	Scoped, retry-limited, no infinite “can you clarify?” spirals
Namespaced artifacts and context	Retrieval doesn’t just bloat the system prompt
Bigger test surface	Contract tests + integration coverage
Long-term memory	Context can persist without turning prompts into novels

v6 is where platform behavior became enforceable: if you violate a boundary, tests fail; if a tool is not allowlisted, it cannot run; if ambiguity persists, clarification retries are capped.

Iterative design, condensed into one picture — Same idea. Less wiring. Fewer surprises.

What exists in the repository today

SageCompass repo: https://github.com/cleverhoods/sagecompass

Right now it contains:

ambiguity detection subgraph
problem framing subgraph
main supervisor graph
a RAG writer graph that takes information from Drupal

Other pieces exist as roadmap items, but they are yet to be implemented. I'd rather be explicit about that.

What to expect in this series

Each next post will take one architectural idea and tell its story from problem → messy attempts → final approach → lessons.

The next four posts are:

Project architecture — repo shape, boundaries, and why "where things live" becomes a runtime feature
RAG with Drupal — curated context without turning prompts into novels
Contracts & guardrails — making safety and behavior testable instead of wishful
Ambiguity detection — when "unclear input" becomes an orchestration problem, not a UX problem

Final words

If you're building agentic systems that need to behave reliably under ambiguity, or systems that integrate with real content sources like a CMS, follow along - next post is Project architecture (repo boundaries as a runtime feature).

Special thanks to Liudmyla Ravliuk and Erik Poolman for proofreading and mental sparring.

If you have fun with similar problems, find me on LinkedIn.

I'm always happy to compare notes.

Sources / credits

Header comic: https://gunshowcomic.com/513

SpaceX Raptor image: https://x.com/SpaceX/status/1819772716339339664/photo/1

DEV Community