Chris Cowart

Posted on Apr 3 • Originally published at walsenburgtech.com

From Custom Orchestration to LangGraph: Why the Framework Didn't Change My Architecture

#langgraph #ai #python #architecture

Eighth post in a series on building business process automation at scale. This time: what happens when you rewrite a working system in a framework — and nothing changes.

The short version: Blueprint's KYB verification engine has a 4-layer cascade that finds careers pages on company websites. I implemented it twice — once as 1,497 lines of async Python, once as a LangGraph StateGraph. You toggle between them with VERIFIER_USE_LANGGRAPH=true. Same inputs. Same outputs. The framework formalized patterns I was already using. It didn't change the architecture.

Two Implementations, One System

I didn't plan to build this twice. The custom version came first — discovery.py, 1,497 lines of async Python. Playwright browsers, if/else routing, try/except error handling, state passed around as dicts. It grew organically over weeks of running against real company websites, handling every weird edge case the internet threw at it. Parked domains. JavaScript-only navigation. Sites that redirect /careers to a login page. It works. It's ugly in places. It processes thousands of companies.

Then I started looking at LangGraph. Not because the custom version was broken, but because I was curious whether the framework would actually give me anything I didn't already have, or if it was just another abstraction layer to learn.

So I rebuilt the whole cascade as a StateGraph. 9 nodes, conditional edges, typed state. Same Playwright browsers underneath. Same scoring functions. Same output format.

At runtime, you pick which one runs:

if os.getenv("VERIFIER_USE_LANGGRAPH", "").lower() in ("1", "true", "yes"):
    cfg.use_langgraph = True

Both are feature-complete. Both pass the same tests. That's the interesting part — and it's what convinced me the framework wasn't the thing that mattered.

What the Graph Looks Like

If you've used LangGraph you can skip to the flowchart. If not, the short version: you define a TypedDict for your state, write node functions that take state and return partial updates, and wire them together with edges and routing functions. LangGraph handles the execution.

Here's the state. Every field that flows through the cascade:

class DiscoveryState(TypedDict, total=False):
    # Inputs
    company_id: str
    company_name: str
    url: str
    is_parked: bool

    # After homepage navigation
    base_url: str
    base_domain: str
    nav_failed: bool

    # Extracted page data
    elements: list[dict[str, Any]]
    page_data: dict[str, Any]

    # Cascade picks
    best_careers_el: dict[str, Any] | None
    careers_source: str  # "dom" | "llm" | "vision" | "probe" | "none"

    # Output signals
    careers: dict[str, Any]   # {careers_url, ats_platform, ats_url}
    contact: dict[str, Any]   # {contact_email, contact_phone, contact_page_url}

    # Vector store
    similar_companies: list[dict[str, Any]]

And the graph — 9 nodes, conditional edges at each escalation point:

def build_discovery_graph() -> StateGraph:
    graph = StateGraph(DiscoveryState)

    graph.add_node("navigate_homepage", navigate_homepage)
    graph.add_node("entity_match", entity_match)
    graph.add_node("score_dom", score_dom)
    graph.add_node("llm_classify", llm_classify)
    graph.add_node("vision_analyze", vision_analyze)
    graph.add_node("probe_fallback", probe_fallback)
    graph.add_node("navigate_careers", navigate_careers)
    graph.add_node("extract_contact", extract_contact)
    graph.add_node("facebook_fallback", facebook_fallback)

    graph.set_entry_point("navigate_homepage")

    graph.add_conditional_edges("navigate_homepage", route_after_navigate, {
        "facebook_fallback": "facebook_fallback",
        "entity_match": "entity_match",
        "__end__": END,
    })
    graph.add_edge("entity_match", "score_dom")
    graph.add_conditional_edges("score_dom", route_after_dom, {
        "navigate_careers": "navigate_careers",
        "llm_classify": "llm_classify",
        "probe_fallback": "probe_fallback",
    })
    graph.add_conditional_edges("llm_classify", route_after_llm, {
        "navigate_careers": "navigate_careers",
        "vision_analyze": "vision_analyze",
        "probe_fallback": "probe_fallback",
    })
    graph.add_conditional_edges("vision_analyze", route_after_vision, {
        "navigate_careers": "navigate_careers",
        "probe_fallback": "probe_fallback",
    })

    graph.add_edge("navigate_careers", "extract_contact")
    graph.add_edge("probe_fallback", "extract_contact")
    graph.add_edge("extract_contact", END)
    graph.add_edge("facebook_fallback", END)

    return graph.compile()

The routing functions are where the cascade logic lives. Each one looks at the state and decides where to go next:

def route_after_dom(
    state: DiscoveryState,
) -> Literal["navigate_careers", "llm_classify", "probe_fallback"]:
    """After DOM scoring: found -> navigate, else -> LLM."""
    if state.get("best_careers_el"):
        return "navigate_careers"
    return "llm_classify"

DOM scoring didn't find a careers element? Route to LLM classification. LLM didn't find one? Vision analysis. Vision didn't find one? Brute-force probe. At any point, if a layer succeeds, skip straight to career page navigation. The cascade short-circuits as early as possible — the expensive layers only run when the cheap ones fail.

What I Got Wrong the First Time

My first attempt at the LangGraph version had vision analysis running before LLM classification. The logic seemed right — vision should be better than text, right? It's looking at the actual page.

In practice it was a terrible idea. Vision analysis takes a screenshot, overlays numbered badges on every candidate element, sends the image to a model, and parses the response. It's 3-4x slower than sending text to an LLM. And for the majority of sites where the careers link has a reasonable label, the LLM finds it just fine from the text alone. Running vision first burned compute for no improvement on 80% of sites.

The fix was obvious once I looked at the data: order the layers by cost, not by theoretical capability. Deterministic scoring first (milliseconds, no model at all), then LLM text (a few seconds), then vision (screenshot + model call), then brute-force probe (multiple HTTP requests). Each layer is more expensive and slower. You only pay for what you need.

This is the kind of thing no framework tutorial teaches you. The cascade ordering is an architecture decision, and I got it wrong by thinking about it abstractly instead of looking at where the time actually went.

What LangGraph Gives You

Typed state. In the custom version, state is a dict. Typo in a key name? Runtime error, probably at 2am when you're watching batch logs. In the LangGraph version, DiscoveryState is a TypedDict. Your editor catches it before you run anything. I found three key-name typos when I did the migration. Three. In code that had been running in production.

Declarative routing. The custom version has nested if/else blocks spread across hundreds of lines. The LangGraph version has named routing functions with type-annotated return values. Same logic, but you can read the graph construction and understand the flow without digging through the full file.

Visualization. graph.get_graph().draw_mermaid() gives you a flowchart. I used this more than I expected — mostly to explain the system to myself when debugging weird state transitions at the edges.

Checkpointing. If a run crashes at the vision analysis step, you can resume from there instead of starting over. I'm not using this yet, but building it myself would mean writing a state serialization layer, and I've got better things to do.

What You Give Up

Stack traces. When something breaks in the custom version, the traceback points straight at my code. In the LangGraph version, there are framework frames between my node function and the error. Not a dealbreaker, but it slows down debugging.

Weird edge cases get structural. Parked domains skip the entire cascade and go straight to a Facebook fallback. In the custom version, that's an if statement at the top of the function. In LangGraph, it's a conditional edge from the first node. It works, but now the graph has to represent every edge case in its topology. Clean paths are cleaner. Messy paths are the same amount of messy.

It's another thing to know. The custom version is just Python. The LangGraph version requires understanding StateGraph, nodes, edges, RunnableConfig. Not a steep curve, but it's not zero.

This Isn't New

The thing that keeps nagging at me about the whole LangGraph conversation — and the broader "agentic AI" conversation — is that none of this is new. The space moves fast, new frameworks every quarter, but the underlying architecture hasn't changed.

The 4-layer cascade is confidence-based escalation. Start cheap. If you're confident, stop. If not, escalate. If nothing works, fall back to brute force. I built this same pattern at Jabil in 2015. The PFEP system classified manufacturing parts — tens of thousands of components mapped to packaging specifications. The ML model handled the easy cases. When confidence dropped below threshold, it routed to manufacturing engineers who knew the difference between two nearly identical part numbers.

That was human-in-the-loop with confidence-based routing. We didn't call it "agentic." We called it "the system" and got on with it.

Blueprint's cascade is the same idea applied to web scraping. DOM scoring is fast and deterministic — usually right. LLM classification catches what the scorer missed. Vision analysis handles the truly weird layouts. Probe fallback is brute force for when everything else fails.

The routing between layers is the architecture. LangGraph is one way to express it. Custom Python is another. Next year there'll be a third. The routing logic won't change.

The Actual Takeaway

LangGraph is good. I'd use it again. The typed state alone justified the port — those three key-name typos were real bugs that just hadn't bitten hard enough yet.

But if I'd started with LangGraph from day one, the system would look the same. Same nodes. Same routing. Same escalation thresholds. The framework didn't teach me when to escalate or how to handle a parked domain or what confidence threshold makes DOM scoring worth trusting. I learned those things by running the custom version against thousands of real websites and watching where it broke.

VERIFIER_USE_LANGGRAPH=true switches the orchestration layer. The inputs don't change. The outputs don't change. The cascade still escalates the same way. The framework is the wiring. The architecture is the circuit.

If you're picking between LangGraph and CrewAI and AutoGen and whatever ships next quarter — stop. Draw the graph on paper first. Figure out your escalation points, your failure modes, your confidence thresholds. Then pick the framework that gets out of your way. Not the one with the best demo. The one that lets you focus on the decisions that actually matter.

The frameworks will change again next year. The pattern won't.

The full source for both implementations is on GitHub: avatar296/blueprint.

DEV Community