Aniket Hingane

Posted on Apr 3

Layered Context Routing for Campus Operations: A Facilities Intake PoC

#python #context #machinelearning #campus

How I Stacked Policy, Place, and Urgency Signals to Route Maintenance Requests

TL;DR

This write-up describes a personal experiment where I treat campus facilities intake as a context engineering problem rather than a single prompt. I combine TF-IDF retrieval over a small policy corpus with building metadata and lightweight urgency hints parsed from free text, then route tickets with explicit rules so every decision stays inspectable. The code lives in a public repository I published for learning purposes, and nothing here should be read as production guidance for a real university or as anything connected to an employer. From my perspective, the lesson worth sharing is that when operational language is messy, stacking context in named layers makes debugging and iteration far easier than stuffing everything into one opaque blob.

Introduction

I have spent a fair amount of time thinking about how large language models behave when the input is short, ambiguous, and emotionally loaded. Facilities tickets are a good toy domain for that reason. A message might mention a fume hood, a basketball practice schedule, and a broken card reader in adjacent sentences. If you send that straight into a generic completion call, you can get fluent text that is wrong in subtle ways. As per my experience, the failure mode is rarely “the model cannot write sentences.” It is usually “the model does not know which institutional rule actually applies,” or “the model over-trusts the most recent sentence.”

In my experiments, I wanted a system that still fits on a laptop, does not require a proprietary dataset, and makes the reasoning chain visible in ordinary logs. I chose a campus operations framing because it forces a blend of safety language, building-specific nuance, and time-of-day common sense without touching regulated domains I am intentionally avoiding in this series. The repository is a solo sketch, not a deployed service, and I refer to it throughout as a proof of concept.

There is another motivation I should state plainly. I am interested in practices that survive contact with maintenance engineers, students, and staff who do not care about the underlying ML buzzwords. People submit tickets under stress. They shorten building names, omit room numbers, and reference “that hallway near the lab” without GPS coordinates. Any system that pretends the text is already structured is going to fail in ways that look embarrassing on a demo but painful in real life. I did not solve that fully here; I only created a place to talk about it honestly while still writing code.

I also want readers to know the scope boundary I used while writing. This article discusses a synthetic dataset and illustrative SLAs. It does not describe any real institution’s priorities, staffing model, or vendor contracts. If a phrase resembles language you have seen in the wild, that is because operational writing converges on similar vocabulary, not because I copied private material.

What's This Article About?

The article walks through the design of CampusContextRouter-AI, a Python project that ingests synthetic maintenance-style requests, retrieves relevant policy snippets, attaches place context from a JSON registry, derives urgency signals from the wording, and emits a route bucket with a priority band and a notional SLA window. I wrote it this way because I wanted to mirror how a human dispatcher glances at policy, then place, then severity, before choosing a queue.

You will see how I separate retrieval from routing, why I kept the router deterministic in this iteration, and how I generate both a Rich table for the terminal and a simple matplotlib chart so a batch run has a visual artifact. I also discuss limitations honestly: tiny corpora, linear scoring, and heuristic SLAs are not the same as a live work-order system.

If you are wondering what “context engineering” means in concrete terms here, my working definition is simple: decide what information belongs together, decide what must never be mixed, and serialize the result in a predictable shape. Retrieval produces evidence. Place metadata grounds the evidence. Session signals modulate urgency. Routing consumes all three without collapsing them into an undifferentiated string. That definition may differ from how other authors use the phrase, and that is fine; the implementation is the ground truth for this PoC.

You should also expect commentary on failure modes. A demo that only shows happy paths is a brochure, not engineering writing. I call out retrieval sparsity, policy conflict, and the limits of regex urgency. I discuss what I would measure next if this stayed a hobby project for more than a few weekends.

Tech Stack

The implementation is intentionally boring in a good way. I rely on Python 3.10 or newer, NumPy, scikit-learn for TF-IDF and cosine similarity, matplotlib for a bar chart, and Rich for readable terminal output. There is no hosted vector database and no cloud requirement; the entire index fits in memory.

From where I stand, that stack is enough to demonstrate the idea that “context engineering” can be practiced with classic IR tooling when your corpus is small and your goal is structured assembly rather than open-ended generation. If I later swap TF-IDF for embeddings, the layered interfaces remain stable, which was a design goal while I sketched the modules.

Why Read It?

If you are evaluating how to structure prompts or pre-model logic for operational chatbots, this article offers a concrete pattern: treat context as composable blocks with clear boundaries. If you are learning scikit-learn’s text pipelines, the retrieval module is short and testable. If you care about reproducibility, the deterministic router gives you a baseline against which any future learned model can be compared.

I think the read is most useful for practitioners who want a middle ground between “pure LLM” and “pure rules,” because the code shows exactly where those worlds meet in my PoC.

There is also a pedagogical angle I care about. Many tutorials jump straight to embeddings and vector databases without establishing why lexical baselines still matter. I am not anti-embedding; I use them elsewhere. But I believe beginners should see cosine similarity on explicit vectors at least once, because it demystifies what “nearest neighbor” means in code rather than in marketing language.

Finally, if you maintain open-source examples, you know the burden of dependencies. I kept the stack small so a reader in a constrained environment can still run the demo. That constraint shaped decisions as much as any architectural principle.

Let's Design

Framing the problem without overfitting the story

Before touching code, I spent time writing short synthetic tickets on paper. I noticed recurring patterns: some messages emphasize harm or hazard words early, others bury the actionable detail in the second half, and a few mix multiple issues that would normally be split in a mature work-order system. I did not try to solve splitting in this repository. Instead, I focused on a single-text input so the context layers stay easy to reason about. That choice trades realism for clarity, and I am comfortable stating that upfront.

Why layers instead of one concatenated prompt

The design starts from a simple observation I kept returning to while prototyping: policy text is not interchangeable with place metadata. Policies answer what must happen in general. Place metadata answers where the work lives and what constraints repeat for that site. Session signals answer how hot the ticket sounds and whether the clock matters. When I mixed those prematurely, I got tangled prompts. When I separated them, I could log each layer independently.

The retrieval layer reads data/policies.json. Each record is a chunk with an identifier, a topic tag, and prose. The TF-IDF vectorizer uses English stop words and unigrams plus bigrams to catch phrases like “fume hood” that unigrams alone might dilute. For each ticket, I take the top few chunks by cosine similarity and format them as a bullet list with scores.

The place layer reads data/buildings.json. Each building has a code, human-readable name, zone label, hours profile, and short risk notes. I do not attempt geospatial reasoning in this PoC; the point is to show how a second JSON source can be merged without contaminating the policy text.

The signal layer currently combines a local hour and weekday flag with an urgency score derived from regular-expression keyword groups on the ticket text. The score is deliberately primitive. In a later experiment I might replace it with a lightweight classifier, but I wanted something explainable first.

Routing maps the assembled layers to an enumerated bucket such as laboratory safety, classroom AV, grounds, HVAC, or a general bucket. Priorities and SLA hours are assigned with transparent rules that look at both the keyword path and the urgency score. That logic lives entirely in Python so I can unit test it without GPU dependencies.

Architecturally, the flow is linear: load JSON, fit the TF-IDF index once, iterate demo tickets, assemble layers per ticket, call the router, collect rows, render a Rich table, and plot bucket counts. The diagrams in the repository restate the same story visually.

Retrieval choices and what I rejected

I considered a few alternatives before settling on TF-IDF for the first public cut. A dense embedding model would likely rank semantically related chunks more robustly, but it would also introduce versioning questions, dependency weight, and reproducibility concerns for readers who just want to clone and run. I decided that demonstrating clean interfaces mattered more than squeezing extra retrieval quality from a miniature corpus. In my opinion, that is a trade only the author can judge; for teaching purposes, I wanted the smallest artifact that still supports cosine similarity and top-k inspection.

I also thought about BM25. It is a strong baseline for lexical tasks and behaves well on short documents. I stayed with TF-IDF largely because the scikit-learn pipeline is familiar to many readers and the difference between BM25 and TF-IDF on eight short policies is unlikely to change the story materially. If I expand the corpus by an order of magnitude, BM25 or a hybrid approach becomes more compelling.

Urgency scoring as a deliberately imperfect heuristic

The urgency score is built from weighted regular expressions. That looks naive, and it is naive. I still found value in it because it forces me to name the cues I care about: leaks, odors, elevators, HVAC loss, outdoor lighting, and a handful of AV terms. Each cue adds a partial weight capped at one. The cap matters; without it, a long message with many benign keywords could look hotter than a short emergency note.

When I tested early versions, I saw false positives where “water” appeared in a benign sentence. I tightened patterns to word boundaries and preferred compound cues. This is not a claim that regex is sufficient in production. It is a claim that explainable baselines are useful when you compare future models against something you can read in a single screen.

Observability as a first-class requirement

I log the layered context for the first ticket in every run not because the first ticket is special, but because it proves the pipeline without drowning the reader in repetition. In a longer study I would probably log structured JSON for every ticket and ship it to a file, but the PoC keeps stdout readable.

The matplotlib chart is part of the same philosophy. A batch table tells you what happened row by row; a distribution tells you whether the demo batch skewed toward one bucket. In my experiments, skew often revealed mistakes in keyword priorities rather than retrieval mistakes, which surprised me at first.

Let's Get Cooking

The entry point is main.py. It keeps the demo batch in one helper so the narrative stays obvious when someone reads top to bottom.

def _demo_tickets() -> list[tuple[str, str, str, int, bool]]:
    """(id, building_code, text, hour_local, weekday)"""
    return [
        (
            "T-1001",
            "SCI-E",
            "Strong chemical odor near fume hood B2; two students reported eye irritation.",
            14,
            True,
        ),
        # ... additional synthetic tickets ...
    ]

What this does: It defines the synthetic workload as tuples that include a ticket identifier, a building code that keys into buildings.json, the free-text body, and a synthetic clock. I structured it this way because separating clock information from the text lets me test urgency scoring and time-aware policies independently without fabricating timestamps inside the prose.

Why I wrote it this way: Early on, I inlined timestamps as strings inside the ticket text and immediately regretted it. Parsing times from natural language is a separate project. For this PoC, explicit fields keep runs reproducible.

The layered assembly happens through assemble_layers in context_layers.py. The function pulls retrieval results, formats three blocks, and returns a LayeredContext dataclass.

def assemble_layers(
    *,
    query_text: str,
    building_code: str | None,
    session: SessionSignals,
    index: PolicyIndex,
    buildings: dict[str, BuildingContext],
    top_k: int = 3,
) -> LayeredContext:
    retrieved = index.top_k(query_text, k=top_k)
    policy_block = _format_policy_block(retrieved)
    if not retrieved:
        policy_block = "POLICY SNIPPETS: (no retrieval match; use baseline routing rules)"

    b = buildings.get(building_code) if building_code else None
    place_block = _format_place_block(b)
    signal_block = _format_signal_block(session)

    return LayeredContext(
        policy_block=policy_block,
        place_block=place_block,
        signal_block=signal_block,
        retrieved=retrieved,
    )

What this does: It centralizes formatting so the router always sees the same headings for each layer. Empty retrieval is handled with a explicit fallback string rather than silent failure.

Why I structured it this way: In my opinion, the hardest part of small retrieval systems is debugging silent degradation. If nothing matches, I want that fact visible in the console output.

Retrieval itself is a thin wrapper around scikit-learn.

class PolicyIndex:
    """In-memory TF-IDF index over policy text."""

    def __init__(self, chunks: list[PolicyChunk]) -> None:
        self._chunks = chunks
        corpus = [c.text for c in chunks]
        self._vectorizer = TfidfVectorizer(
            lowercase=True,
            stop_words="english",
            ngram_range=(1, 2),
            min_df=1,
        )
        self._matrix = self._vectorizer.fit_transform(corpus)

    def top_k(self, query: str, k: int = 3) -> list[RetrievedChunk]:
        q = self._vectorizer.transform([query])
        sims = cosine_similarity(q, self._matrix).flatten()
        order = np.argsort(-sims)
        out: list[RetrievedChunk] = []
        for idx in order[:k]:
            score = float(sims[idx])
            if score <= 0:
                continue
            out.append(RetrievedChunk(chunk=self._chunks[int(idx)], score=score))
        return out

What this does: It builds a matrix once and scores incoming ticket text as another vector. Cosine similarity ranks chunks, and I discard zero scores to avoid clutter.

What I learned: On a toy corpus, bigrams matter. Without them, “fume hood” sometimes loses to generic maintenance words. I kept the corpus tiny on purpose to force myself to think about chunk wording.

The router combines keyword cues, the top retrieved topic, and session urgency.

def decide_route(
    *,
    free_text: str,
    layered: LayeredContext,
    building: BuildingContext | None,
    session: SessionSignals,
) -> RoutingDecision:
    topic = _topic_from_retrieval(layered)
    keyword_bucket = _adjust_bucket_from_keywords(free_text)

    bucket: RouteBucket
    if keyword_bucket is not None:
        bucket = keyword_bucket
    elif topic == "laboratory":
        bucket = RouteBucket.SAFETY_EHS
    elif topic == "classroom_av":
        bucket = RouteBucket.AV
    # ... additional topic mappings ...
    else:
        bucket = RouteBucket.GENERAL

    urgency_score = session.urgency_hint_score
    priority = "P2"
    sla = 48.0

    if bucket in (RouteBucket.SAFETY_EHS, RouteBucket.PLUMBING) and urgency_score > 0.3:
        priority = "P0"
        sla = 4.0
    elif bucket == RouteBucket.ACCESS:
        priority = "P1"
        sla = 8.0
    # ... additional escalations ...

    return RoutingDecision(
        bucket=bucket,
        priority=priority,
        sla_hours=sla,
        rationale="; ".join(rationale_parts),
    )

What this does: It makes the decision path explicit. Keyword overrides fire first because certain phrases imply a channel regardless of retrieval noise. Topic labels from the best chunk act as a secondary signal. SLA tightening uses both bucket membership and the urgency score.

Why I put it this way: I needed a single function I could read during demos without opening a notebook. The rationale string is there so future me remembers why a ticket landed where it did.

Finally, plotting is one function that turns bucket names into counts.

def plot_bucket_distribution(
    buckets: list[str],
    out_path: Path,
) -> None:
    counts = Counter(buckets)
    labels = sorted(counts.keys())
    values = [counts[k] for k in labels]

    fig, ax = plt.subplots(figsize=(8, 4.5))
    ax.bar(labels, values, color="#2c5282")
    ax.set_title("Synthetic routing batch: bucket counts")
    ax.set_ylabel("Tickets")
    ax.tick_params(axis="x", rotation=35)
    fig.tight_layout()
    out_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(out_path, dpi=120)
    plt.close(fig)

What this does: It produces a basic bar chart so the batch run is not only textual. For the animated cover asset, I used that chart as the UI half of the GIF after the terminal sequence.

Repository link: The full project, including diagrams and the terminal animation asset, is available at https://github.com/aniket-work/CampusContextRouter-AI

Reporting code stays intentionally thin: tables for people, files for artifacts.

def print_routing_table(
    console: Console,
    rows: list[tuple[str, str, str, str, str]],
) -> None:
    table = Table(title="Campus facilities intake (batch routing)")
    table.add_column("Ticket", style="cyan", no_wrap=True)
    table.add_column("Building", style="magenta")
    table.add_column("Route", style="green")
    table.add_column("Pri", justify="center")
    table.add_column("SLA h", justify="right")

    for ticket_id, building, route, pri, sla in rows:
        table.add_row(ticket_id, building, route, pri, sla)

    console.print(table)

What this does: It renders aligned columns with consistent headers so a batch run looks like a dispatch screen rather than a raw log dump.

Why I structured it this way: In my opinion, presentation quality changes how seriously I take my own outputs during development. If the table looks sloppy, I assume the logic is sloppy.

The urgency helper is small but central to how priorities tighten.

_URGENCY_WORDS = [
    (r"\b(leak|flooding|flood|spill|water)\b", 0.35),
    (r"\b(smoke|fire|odor|fume|chemical)\b", 0.45),
    (r"\b(elevator|stuck|door won\'t open|door wont open)\b", 0.25),
    (r"\b(no heat|no ac|freezing|overheat)\b", 0.25),
    (r"\b(projector|microphone|av|audio|display)\b", 0.15),
    (r"\b(light|outage|dark|walkway)\b", 0.12),
]


def _urgency_hint(text: str) -> float:
    t = text.lower()
    score = 0.0
    for rx, w in _URGENCY_WORDS:
        if re.search(rx, t):
            score += w
    return min(1.0, score)

What this does: It scans for cues that should raise urgency regardless of which policy chunk wins retrieval.

What I learned: Weight tuning is subjective. I chose weights that made the science-lab odor scenario land in a high band without pushing every AV ticket into emergency territory.

How this differs from “just prompt better”

It is tempting to believe a single system message can replace structured preprocessing. Sometimes that works for short tasks. For operational intake, my experience has been that models benefit from retrieval that is inspectable outside the model. I am not arguing against LLMs; I am arguing that the PoC should show where the boundaries belong. If the retrieval list is wrong, I can fix the corpus or the vectorizer without touching the router. If the router rules are wrong, I can adjust routing without touching retrieval. That separation of concerns saved me time during debugging.

What a language model could do in a later iteration

If I add a model, I would keep it as a rewriter or validator, not as the sole authority. A plausible pattern is: assemble layers exactly as today, ask the model to propose a bucket and rationale, then compare against deterministic rules. Disagreements become training data or prompts for refinement. I have not implemented that here because I wanted the repository to remain runnable without API keys, but the layering is compatible with that roadmap.

Performance characteristics I measured informally

This is not a benchmark article, but I did sanity-check runtime on a laptop. Fitting the TF-IDF matrix on eight policies is effectively instantaneous. Routing five tickets is trivial. Matplotlib dominates wall time relative to retrieval, which reinforces that the PoC is not CPU-bound. If I scaled to thousands of policies, I would need a more serious index and probably batch vectorization, but that is not the bottleneck today.

Let's Setup

Step-by-step details can be found in the repository README. At a high level, the setup I used while iterating locally follows a predictable pattern.

Create an isolated virtual environment in the project directory so dependencies never leak across unrelated experiments.
Install requirements from requirements.txt exactly as pinned there to avoid surprise upgrades to scikit-learn behavior.
Run python main.py with the bundled demo batch to confirm retrieval, routing, and chart generation all succeed on your machine.

If you clone the repository, you will notice there is no .env requirement for the baseline demo. I kept secrets out of the PoC on purpose so CI or readers can execute it without API keys.

Let's Run

When I run python main.py, the program prints the full layered context for the first ticket, then prints the batch table, then writes output/routing_bucket_distribution.png. That order is intentional: the first block proves the retrieval and formatting pipeline, the table proves routing consistency, and the chart proves that visualization hooks stay wired.

In my observation, the most interesting console output is the policy snippet list with scores. Even with eight policies, you can watch how sensitive the ranking is to verbs like “odor” versus “noise.” That sensitivity informed how I wrote the synthetic tickets.

Edge Cases I Thought About

No PoC is complete without acknowledging where it would break. These points are worth spelling out because they shaped what I did not attempt.

Sparse retrieval: If the ticket uses slang that never appears in the policy corpus, TF-IDF may return low scores across the board. The router still runs, but the topic signal becomes weak. A mitigation I considered is hybrid retrieval with a keyword inverted index, which would be a natural extension.
Conflicting policies: Real campuses can have overlapping rules. I store independent chunks and do not yet model precedence. In a future iteration, explicit precedence edges between chunk IDs would be cleaner than hoping retrieval ranks resolve conflicts.
Time semantics: I pass hour and weekday as integers rather than parsing from text. That avoids accidental contradictions between embedded timestamps and structured fields.
Equity and access: Routing touches accessibility topics. I include them because ignoring them would be unrealistic, but the SLA numbers are illustrative only. A production system would need institutional review, not a hobby script.
Duplicate submissions: Real users open multiple tickets for the same incident. I do not deduplicate or thread conversations in this repository. A deduplication layer would likely sit upstream of retrieval, comparing embeddings of entire messages and linking to an incident ID.
Seasonality: A field house ticket in January is not the same as one in August. My building metadata includes a seasonal hours profile label, but I do not dynamically adjust SLAs by season. Extending the signal layer with calendar metadata would be straightforward, but it would also require more realistic data than I wanted to maintain for a hobby repo.
Language and tone: The PoC assumes English prose of moderate formality. Multilingual campuses would need tokenization choices and policy corpora per language. I did not attempt multilingual retrieval because verifying quality would require skills and resources outside the scope of a solo weekend project.
Malicious input: Free-text fields can be abused. I do not implement content filtering here. If this were more than a local script, I would add rate limits, length limits, and basic abuse detection before any retrieval occurs.

Ethics and Responsibility

Even though this repository is synthetic, the language of safety and access deserves care. I wrote the tickets and policies to resemble realistic operational phrases without copying any private incident text. From my perspective, that distinction matters: public demos should never repurpose confidential work orders.

I also want to be explicit that automated routing for physical risk scenarios should never be the only line of defense. The PoC emits labels; it does not dispatch tradespeople, text students, or close tickets. Treat it as a learning scaffold.

Future Roadmap (Personal Experiments Only)

If I revisit this repository, several extensions seem worthwhile, still on my own time and still labeled experimental.

Replace TF-IDF with embeddings while keeping the same layer boundaries, then measure how often the router disagrees with the baseline.
Add a second corpus for local procedures that updates more frequently than policy, mimicking how some campuses separate “policy” from “playbook.”
Introduce evaluation harnesses with labeled tickets, even if synthetically expanded, so precision and recall become measurable instead of eyeballed.
Wrap the router output as JSON for a tiny local web UI if I want a friendlier demo than the terminal alone.
Add property tests that generate random word order permutations for tickets to see whether retrieval remains stable enough for my tolerance thresholds.
Explore calibration for urgency scoring so numeric outputs map to observed human labels in a small user study. That is far beyond this PoC, but worth naming as a scientific next step rather than an engineering tweak.

None of those items are promises; they are directions I might explore when curiosity and spare time align.

Documentation habits that helped me

While building this, I noticed that my velocity correlated with how aggressively I documented assumptions in the README. Not tutorial prose for beginners, but crisp statements of non-goals. When I wrote “illustrative SLA,” I stopped myself from secretly believing the numbers meant more than they did. When I listed repository layout, I caught a path mistake before publishing.

I also kept commit messages boring on purpose. Small repositories deserve readable history too. If I ever return to this code months later, I want future me to recognize intent without decoding clever commit titles.

Reproducibility notes

Reproducibility is part engineering and part discipline. Pinning dependencies avoids the subtle drift that happens when scikit-learn changes defaults across versions. Keeping random seeds matters when you add stochastic components; this PoC has none, which is a feature for now. Recording the Python version in the README is a small touch that prevents “works on my machine” surprises.

If you fork the repository, consider writing down your own environmental constraints. I develop on macOS, but the code should run anywhere Python runs. If matplotlib backend issues appear on a headless server, switching to a non-interactive backend is a known fix; I did not need it for local runs.

A longer note on evaluation and what I did not measure

Evaluation is where hobby projects either mature or remain toys. In this PoC, I relied on manual inspection: reading retrieved snippets, checking whether the science lab odor ticket escalated appropriately, and scanning the distribution chart for obvious skew. That approach is acceptable for a first public version because the goal was architectural clarity, not leaderboard scores.

If I were to formalize evaluation without access to private tickets, I would start by synthesizing a larger labeled set from templates. I would vary lexical overlap, negation, and multi-issue messages. I would measure top-k recall for policy topics and confusion matrices for route buckets. I would also track stability: if I paraphrase a ticket without changing meaning, does retrieval remain broadly consistent? That kind of robustness test often reveals brittleness faster than aggregate accuracy.

I did not build those harnesses here because they expand repository scope quickly. Test data management is a project of its own. Still, naming the gap matters. Readers should not mistake a crisp table output for validated operational performance.

Visual assets and why the GIF exists

The repository includes Mermaid diagrams rendered to PNG via mermaid.ink because I wanted graphics that match the tone of technical documentation rather than stock photography. The animated GIF pairs a terminal sequence with a matplotlib chart to mirror how I actually work: run a script, scan the table, glance at a plot. Creating the GIF took extra time, but from my perspective it communicates intent faster than static screenshots alone.

I followed a strict palette conversion pipeline for the GIF to reduce flicker on platforms that are picky about animated assets. The details are mundane image processing, but the outcome matters when you publish where rendering quirks exist.

Personal lessons I did not expect

While wiring the urgency heuristic, I expected retrieval to dominate mistakes. Instead, I often found myself adjusting keyword lists because the language of urgency in synthetic tickets did not match the language of policy chunks. That mismatch reminded me that retrieval and rules interact; you cannot tune one in isolation forever.

Another surprise was how quickly the Rich table made the PoC feel “real.” Presentation is not substance, but human perception matters when you judge your own progress. I kept the table formatting minimal on purpose. Dense color in terminal output ages poorly and distracts from the routing story.

Finally, I was reminded how much I enjoy small JSON corpora. They are easy to diff in code review, easy to version, and easy to explain to someone unfamiliar with machine learning. If I had started with a database, I would have spent more time on migrations than on routing logic.

If you made it this far, thank you for reading carefully. I wrote this piece to document not only what the code does, but why I accepted certain limitations while rejecting shortcuts that would have made the demo flashier yet less honest.

Closing Thoughts

What I take away from this experiment is that “context engineering” is not only a prompt-design exercise. It is also an exercise in deciding what information deserves its own channel, how much structure to add before model calls, and how to leave an audit trail. The campus framing helped me keep those questions grounded.

If you try the code, I hope you modify the policy JSON and watch how retrieval shifts. In my experience, that kind of hands-on perturbation teaches more than reading another listicle about embeddings.

I also keep returning to a humbling point: good operations depend on people who answer phones, visit sites, and coordinate trades. Software can sort and summarize, but it cannot replace the embodied knowledge of how a specific building behaves in winter. This PoC stays modest because that human layer matters more than any script I would ship on a weekend.

As a final note, this article is an experimental write-up based on a hobby repository. It is not production guidance, not campus policy, and not affiliated with any organization I work with.

Disclaimer

The views and opinions expressed here are solely my own and do not represent the views, positions, or opinions of my employer or any organization I am affiliated with. The content is based on my personal experience and experimentation and may be incomplete or incorrect. Any errors or misinterpretations are unintentional, and I apologize in advance if any statements are misunderstood or misrepresented.

Tags: python, context, machinelearning, campus

DEV Community