Jan Tschada

Posted on Jun 22

Building a Local AI Agent for OSM: 21 Days of Iteration

#geospatial #ai #python #datascience

How I turned a vague idea into a working RAG pipeline for dynamic OSM filter generation

I spent 21 days building a local AI agent that translates natural language requests into osmfilter JSON. This is the technical deep‑dive, the architecture, the failures, the fixes, and the lessons learned. If you're working with local LLMs, embeddings, or OSM, there's something here for you.

The Spark

It started with a simple observation: translating "Find only restricted areas" into an OSM filter is a task that should be automatable, but not with a static keyword map. The nuance is too high. seamark:type=restricted_area is correct for maritime zones, but access=exclusion_zone might be better for land. The LLM needs context.

I also wanted everything to run locally. No API calls, no rate limits, no privacy concerns. Just a consumer GPU and a local model.

What follows is the unvarnished story of building that system over three weeks, working mostly in the evenings.

Week 1: Foundation (Days 1–7)

Day 1–2: The LLM Wrapper

I started with the simplest possible thing: a class that wraps llama-cpp-python and can call the model with a prompt. The goal was to get a JSON response—no fluff.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

class LocalLLMFunctionCaller:

    def __init__(self, model_path: str, n_ctx: int = 2048, temperature: float = 0.0):
        self.llm = Llama(model_path=model_path, n_ctx=n_ctx, temperature=temperature, verbose=False)

    def call_llm(self, prompt: str, max_tokens: int = 200) -> str:
        response = self.llm(prompt, max_tokens=max_tokens, stop=["\n\n"], echo=False)
        return response["choices"][0]["text"].strip()

Simple. But the stop=["\n\n"] would come back to haunt me later.

I also built the executor.py module, which extracts JSON from arbitrary LLM output using a simple brace‑matching algorithm. No regex—just counting depth.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def extract_json(text: str) -> str:
    start = text.find("{")
    if start == -1:
        raise ValueError("No JSON object found")
    depth = 0
    for i, ch in enumerate(text[start:], start=start):
        if ch == "{":
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0:
                return text[start:i+1]
    raise ValueError("Unbalanced JSON braces")

This is one of those pieces of code that just works, and I haven't touched it since.

Day 3–4: The Prompt Templates

I wrote prompts.py with several builders:

build_prompt(): basic function‑calling
build_mcp_prompt(): MCP‑style tool calls
build_osmfilter_prompt(): zero‑shot OSM filter generation
build_osmfilter_prompt_with_examples(): few‑shot with provided examples

The last one became the core. It takes a user query and a string of pre‑formatted examples, and outputs a JSON filter structure:

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def build_osmfilter_prompt_with_examples(user_query: str, examples: str) -> str:
    return f"""
You are an assistant that generates OSM filter expressions...
Examples:
{examples}
User: "{user_query}"
Assistant:
"""

This is where the real work would happen in later weeks.

Day 5–7: The Embedding Model

I needed a way to find relevant examples. Enter bge-small-en-v1.5, a 384‑dimension embedding model that fits in 33 MB. I built LocalLLMEmbedder to handle embeddings and store them in SQLite.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

class LocalLLMEmbedder:

    def __init__(self, model_path: str):
        self.embed_model = Llama(model_path=model_path, n_gpu_layers=-1, embedding=True, verbose=False)

    def create_embedding(self, text: str) -> np.ndarray:
        result = self.embed_model.create_embedding(text)
        return np.array(result["data"][0]["embedding"], dtype=np.float32)

I then ingested the taginfo-wiki.db dataset, embedding each OSM tag description as a JSON object:

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

text_obj = {
    "tgroup": tgroup,
    "key": key,
    "value": value or "",
    "description": desc,
    "implies": implies or "",
    "combination": combo or "",
    "linked": linked or "",
    "status": status or "",
    "approval": approval or ""
}
text = json.dumps(text_obj)
embedding_blob = self.create_embedding_blob(text)

At this point, I had a database with embeddings for every documented OSM tag, and a way to search them.

Week 2: RAG & CLI (Days 8–14)

Day 8–9: Building the Example Database

Embedding OSM tags is one thing. Embedding filter examples is another. I created a table called filter_examples with columns for the natural‑language query, the JSON AST, the extracted tags, and the embedding of the natural‑language query.

I also wrote a parser to extract examples from a plain text file:

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def read_osmfilter_samples(self, file_path: str) -> list:
    # Reads a file with User: / Assistant: blocks
    # Parses the JSON and returns a list of examples

The parser is a state machine that detects User: lines, Assistant: lines, and JSON braces. It's not the most elegant parser, but it works for the 200 or so examples I have.

Day 10–11: Search Functions

I implemented search_filter_examples(), which embeds the user query and computes cosine similarity against all stored examples:

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def search_filter_examples(self, query: str, db_path: str = "taginfo-wiki.db", min_score=0.65, k=10):
    q_emb = self.create_embedding(query)
    q_vec = np.frombuffer(q_emb, dtype=np.float32)
    # Query all stored embeddings, compute cosine similarity, filter by min_score
    scored.sort(reverse=True)
    return scored[:k]

The cosine similarity function is straightforward:

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

import numpy as np

def cosine(self, a, b):
    a = np.frombuffer(a, dtype=np.float32)
    b = np.frombuffer(b, dtype=np.float32)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Day 12–14: The CLI Entry Point

I built func_cli.py to tie everything together. It accepts a --request and a --model path, retrieves examples, builds the prompt, calls the LLM, and prints the result.

uv run osm-functions --request "Find only restricted areas" --model /path/to/model.gguf

This was the moment of truth. And the first runs—failed.

Week 3: Validation & Polish (Days 15–21)

Day 15–16: The Stop Token Disaster

I kept getting empty responses. The LLM would generate nothing. I added --verbose to see the raw prompt and response, and realized the model was generating a blank line before the JSON. That blank line triggered stop=["\n\n"], cutting off the output before it started.

The fix was simple: remove the stop token entirely.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

# Before
response = self.llm(prompt, max_tokens=max_tokens, stop=["\n\n"], echo=False)

# After
response = self.llm(prompt, max_tokens=max_tokens, echo=False)

I also added a fallback: if the response is empty, return "{}" and let the caller handle it.

Day 17–18: Candidate Validation

I added build_osmtags_validate_prompt() to filter candidate OSM tags by relevance. The LLM receives a list of candidates (key, value, description) and outputs a JSON array of relevant IDs.

This was crucial for domains like maritime vs. land: seamark:type=restricted_area shouldn't show up for a land‑based query.

Day 19–20: Synthesis Instructions

I realised the LLM was sometimes copying examples blindly, even when a combination of tags was more appropriate. I rewrote the prompt to emphasise synthesis:

"If the examples show different tags that could apply to the user request, combine them into a single filter. Do not copy blindly—adapt."

I also added a confidence threshold: if the top‑k similarity score is below 0.4, the LLM is instructed to ask for clarification.

Day 21: The Agent Loop

The final piece was a simple loop that:

Calls the filter generator
Executes the filter (via execute_osmfilter())
If the feature count is zero, broadens the request and tries again
Logs every decision

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

attempt = 0
while attempt < max_attempts:
    filter_json = generate_filter(request)
    result = execute_osmfilter(filter_json)
    if result.count > 0:
        break

    request = broaden_request(request)
    log(f"Empty result, broadening to: {request}")
    attempt += 1

This loop transforms a static generator into an adaptive explorer.

What I Learned (The Hard Way)

Stop tokens are not your friend. Unless you're 100% certain the model will never output the stop token inside valid content, avoid them. Use max_tokens and parse the output instead.

Embedding quality matters. Embedding the full OSM tag description as a JSON string works, but natural‑language sentences like "key=highway, value=primary, description: A major road" gave better results.

The example database is everything. My initial set was heavily biased toward maritime tags. After adding land‑based examples (parking, access, boundaries), the LLM's choices became more context‑aware.

Verbose output is your best debugging tool. Without --verbose, I would have wasted days chasing the stop token bug.

The agent loop is simple but powerful. Less than 50 lines of code, yet it turns a one‑shot generator into a dynamic, adaptive system. Every attempt is logged—request, retrieved examples, generated filter, feature count. That audit trail is invaluable.

The Current State

The system works, for a definition of "works" that includes "still needs more examples, better prompt engineering, and a few edge cases fixed."

The CLI outputs valid JSON filters for most simple queries. The agent loop can handle empty results by broadening the request. And the whole thing runs on a single GPU with less than 8 GB VRAM.

What still needs work:

Synthesis: Combining multiple tags (e.g., maxspeed + surface) still isn't perfect.
Negation: The LLM struggles with "not highways" style queries.
Example diversity: I need more examples, especially complex ones with multiple tags.
Confidence handling: The 0.4 threshold is a guess; I need to fine‑tune it with actual data.

Code Snippets You Might Actually Use

The RAG search function

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def search_filter_examples(self, query: str, db_path: str = "taginfo-wiki.db", min_score=0.65, k=10):
    q_emb = self.create_embedding(query)
    q_vec = np.frombuffer(q_emb, dtype=np.float32)
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    rows = cursor.execute("""
        SELECT natural_language, json_ast, tags, embedding
        FROM filter_examples
        WHERE embedding IS NOT NULL
    """).fetchall()
    scored = []
    for natural_language, json_ast_str, tags_str, emb in rows:
        emb_vec = np.frombuffer(emb, dtype=np.float32)
        score = self.cosine(q_vec, emb_vec)
        if score >= min_score:
            scored.append((score, natural_language, json_ast_str, tags_str))

    scored.sort(reverse=True)
    return scored[:k]

The JSON extraction function

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def extract_json(text: str) -> str:
    start = text.find("{")
    if start == -1:
        raise ValueError("No JSON object found")
    depth = 0
    for i, ch in enumerate(text[start:], start=start):
        if ch == "{":
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0:
                return text[start:i+1]
    raise ValueError("Unbalanced JSON braces")

The CLI entry point (simplified)

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", required=True)
    parser.add_argument("--request", required=True)
    parser.add_argument("--verbose", action="store_true")
    parser.add_argument("--top-k", type=int, default=5)
    args = parser.parse_args()

    caller = LocalLLMFunctionCaller(args.model, n_ctx=8192, temperature=0.0)
    embedder = LocalLLMEmbedder("data/bge-small-en-v1.5-q4_k_m.gguf")

    results = embedder.search_filter_examples(args.request, k=args.top_k)
    examples = format_examples(results)
    response = caller.call_llm_with_osmfilter_examples(args.request, examples)

    if args.verbose:
        print(response["prompt"])
        print("\n" + response["response"])
    print(response["result"])

Where I'm Taking This Next

Spatial operators: "within 5 km of a hospital"
Wikidata integration: enrich OSM features with facts
Better synthesis: combine tags more intelligently
Open‑source: once the edge cases are fixed (currently a technical spike)

Questions for You

How do you handle tag ambiguity in your geospatial agentic workflows?
What's your threshold for asking an LLM to clarify vs. taking a best guess?
Have you tried building a RAG pipeline for geospatial data?

I'd genuinely love to hear your experiences.

Links:

Let the AI Ask for Data: Dynamic OSM Extraction for Agents

DEV Community