Building Hierarchical Destination URLs Without Killing Your Database

#programming #webdev #python #django

One of the more challenging engineering problems we encountered while building our travel portal, was figuring out a decent URL structure for destination hubs.
On the face of it, this one looks relatively easy to crack. You've got a geographically based hierarchy: Global Region, Subregion, Country, Local Region, Local Subregion & City - & it would be nice to have URLs that make sense with that structure. Something that's clean, readable, semantically meaningful, and also SEO-friendly sounds pretty great. The problem is, turning that into a system that doesn't grind your application to a halt.

/destinations/{global-region}/{global-subregion}/{country}/{local-region}/{local-subregion}/{city}/

The problem we're talking about isn't tied to any particular framework. Whether you're working with Django, FastAPI, Flask, Rails, Laravel, or something you cobbled together yourself, if your content has a recursive tree structure & your URLs need to mirror that tree, you'll run into the same roadblock.

The Naive Approach and Exactly Why It Falls Flat

The super obvious path to take is recursive parent traversal whenever you need it. Each node in your hierarchy has a parent node it can latch onto . To build a full URL for any old node, you just have to walk up the tree:

def get_full_path(node, db):
    parts = []
    current = node
    while current is not None:
        parts.append(current.slug)
        current = db.fetch_by_id(current.parent_id)  # one DB query per level
    parts.reverse()
    return "/" + "/".join(parts) + "/"

This works great in development when you only have 20 nodes to worry about. But in production with 400+ nodes and actual traffic coming in, it's basically executing one database query per page render - one for every level in the hierarchy. So a six-level deep spot like Uvero Alto just unleashes six sequential round-trips to the database before you've even started throwing page content on the screen. And then - to make matters worse - multiply that by all the concurrent users trying to get in on the action and connection pool exhaustion becomes a real possibility pretty darn quick.

The second thing that usually pops into people's heads is eager loading - grab the node and all its ancestors in a single swoop with a JOIN. That works just fine if your ORM has got the ability to handle recursive CTEs or if you somehow manage to figure out the max depth ahead of time. But most do not handle arbitrary-depth recursive relationships with any kind of finesse. And, a lot of the time, you just can't predict the depth in advance.

The Solution: Pre-Computing at Write Time

The core idea here is that the URL for a destination node only changes when the node or one of its parents is updated - so you don't need to recalculate it every time you go to read it. Just do the calculation once, stash the result, and then just read it straight out.

When you decide to keep a node, take a moment to calculate and store three related pieces of info:

def compute_node_metadata(node, fetch_parent_fn):
    """
    fetch_parent_fn: callable that takes a parent_id and returns the parent node
    Works with any storage backend — SQL, NoSQL, graph DB, etc.
    """
    # Build full path
    parts = []
    current = node
    while current is not None:
        parts.append(current.slug)
        parent_id = current.parent_id
        current = fetch_parent_fn(parent_id) if parent_id else None
    parts.reverse()
    full_path = "/" + "/".join(parts) + "/"

    # Compute depth
    depth = 0
    current = fetch_parent_fn(node.parent_id) if node.parent_id else None
    while current is not None:
        depth += 1
        current = fetch_parent_fn(current.parent_id) if current.parent_id else None

    # Build breadcrumbs
    crumbs = []
    current = node
    while current is not None:
        crumbs.append({"name": current.name, "path": current.full_path})
        current = fetch_parent_fn(current.parent_id) if current.parent_id else None
    crumbs.reverse()

    return {
        "full_path": full_path,
        "hierarchy_depth": depth,
        "breadcrumbs_data": crumbs
    }

Stick the full_path, hierarchy_depth and _breadcrumbs_data _right into the node record itself - so either add new columns to a database table or add new fields to a document store etc. Don't forget to index the _full_path _if you can.
At request time, the lookup becomes:

def handle_destination_request(full_path, db):
    node = db.fetch_by_indexed_field("full_path", full_path)
    # All derived data is pre-computed
    # Zero recursive queries. Zero parent traversal.
    return render(node)

When a request comes in looking for something, the lookup is now a super quick O(1) regardless of how deep down in the hierarchy the destination is. Whether that endpoint is buried two levels down or six, doesn't matter a bit.

The Cascading Update Problem

Precomputing slugs introduces a new headache : what happens when a parent node's slug gets changed? The result is that all of the descendants' _full_path _gets left behind.

The solution is to sort this out by making the system recompute downwards from the node whenever it gets updated. In a format that's easy to understand (but not code):

def on_node_saved(node, db, fetch_parent_fn):
    metadata = compute_node_metadata(node, fetch_parent_fn)
    db.update(node.id, metadata)

    # Propagate to all direct children
    children = db.fetch_children(node.id)
    for child in children:
        on_node_saved(child, db, fetch_parent_fn)
        # Recursion handles arbitrarily deep subtrees

In a framework with event hooks and all that jazz (post-save signals, lifecycle callbacks, database triggers), make sure this happens automatically as part of your persistence layer. If you're in a framework that doesn't have all that built in, just call it in your write path explicitly.

To be honest, changes to parent node slugs aren't all that common - hierarchies like Mexico or destinations begin to stabilise pretty quickly once you've set them up initially. So the cascading update is essentially a one-time cost every time you make an edit, not some huge overheard that's eating up resources with every request.

Legacy URL Handling

If you're adding hierarchical URLs to a system that used to just use flat slugs, you're going to need some kind of fallback. Existing pages that've been indexed under the old URL structure still need to be found:

def resolve_destination(path, db):
    # Try canonical hierarchical path first
    node = db.fetch_by_field("full_path", path)
    if node:
        return node

    # Fall back to legacy flat slug
    node = db.fetch_by_field("slug", path)
    if node and node.full_path != path:
        # Issue a permanent redirect to canonical URL
        return redirect_permanent(node.full_path)

    return not_found()

Legacy URLs redirect neatly to the actual hierarchical path. This means link equity is preserved, old indexed pages resolve correctly and you don't end up with duplicate content.

What This Gets You

By using the pre-computed approach you can take the complexity of your URLs off of your request-time performance. The full path for any destination, no matter how deep you go (or how big the hierarchy is) is just a single field read. Plus you can get breadcrumbs and depth information with zero extra queries. It doesn't matter if your backend is Django, Rails, Express or just some serverless function hitting a database - it all works the same way.

The catch is that there's a bit more complexity on the write side and you do need to deal with the cascading update thing. But if your content hierarchy isn't changing all that often, then the trade makes sense. Read performance is what scales as your traffic goes up, and a single field lookup is going to scale in a way that a recursive parent lookup just can't.

_
_