toydev

Posted on Jan 12

2D Spatial Recognition with Local LLM: Comparing Prompt Strategies

#llm #ai #promptengineering #langchain

1. Introduction

My GPU was occupied by LLM experiments throughout the 2025-2026 winter break, but it's finally free now!

I investigated the 2D spatial recognition ability of a local LLM (gpt-oss:20b) using mazes as the subject.

Motivation: Wanted to understand how LLMs perceive space for autonomous navigation
Method: Ask "which direction next?" for each cell with structured output
Prompts: Tried multiple strategies since I didn't know what works best
Source & Results: Published on GitHub

Result: The prompt I initially thought of turned out to be the worst.

Key Findings

Local LLM can handle 2D spatial recognition - gpt-oss:20b achieved sufficient accuracy
Prompt strategy makes a big difference - Response time varies by several times

2. Experiment Setup

Environment

OS: Windows 11 / WSL2 (Ubuntu)
CPU: AMD Ryzen 7 7700
GPU: GeForce RTX 4070 (12GB VRAM)
LLM Runtime: Ollama
Experiment Code: Node.js + TypeScript + @langchain/ollama 1.1.0

Model

Used gpt-oss:20b. Recommended VRAM is 16GB, but it runs on 12GB with CPU offloading (24% CPU / 76% GPU).

Prompt Strategies

Compared 4 strategies (see links for prompt output examples):

simple - ASCII visualization of maze

#####
#S#G#
# # #
#   #
#####

matrix - Binary matrix for walls/paths

[[1,1,1,1,1],[1,0,1,0,1],[1,0,1,0,1],[1,0,0,0,1],[1,1,1,1,1]]

list - List of walkable coordinates

["(1,1)","(3,1)","(1,2)","(3,2)","(1,3)","(2,3)","(3,3)"]

graph - Adjacency list format

{"1,1":["1,2"],"1,2":["1,1","1,3"],"1,3":["1,2","2,3"],...}

Mazes

Used 4 sizes (5x5 to 15x15) x 2 categories (maze list):

corridor - Walled passages

straight	branch	dead-end	loop	spiral

open - Open spaces with obstacles

empty	pass	detour

Black=wall, White=path, Green=Start, Red=Goal

History Option

History refers to the path taken to reach the current cell (e.g., (1,1) -> (1,2) -> (2,2)).

With: Include history in prompt
Without: Exclude history from prompt

Evaluation Method

For each cell in the maze, ask "which direction should I go next?" and record success/failure and response time.

A correct answer is defined as any direction that gets closer to the goal. It doesn't need to be the shortest route.

3. Results

Each combination was run once. Consider this as reference data for observing trends.

Scale Verification

Results from testing all sizes x all strategies on representative mazes (corridor_straight / open_empty), with history enabled.

Accuracy (%)

Size	simple	matrix	list	graph
5x5	100	100	100	100
7x7	97	100	100	100
11x11	82	96	98	95
15x15	-	-	95	89

Response Time (sec/cell)

Size	simple	matrix	list	graph
5x5	29	19	12	12
7x7	77	31	16	17
11x11	313	75	31	64
15x15	-	-	41	190

15x15 matrix/simple were abandoned due to time constraints.

list is fastest and most accurate. The gap widens as size increases. simple degraded to 313 sec/cell (5+ minutes) at 11x11.

Effect of History

Comparing history on/off with list strategy at 11x11 (category averages).

Accuracy (%)

Category	No History	With History
corridor	82	86
open	99	100

Response Time (sec/cell)

Category	No History	With History
corridor	230	110
open	29	26

For corridor types, history enabled is about 2x faster. Open types show little difference.

4. Conclusion

gpt-oss:20b's 2D Spatial Recognition Ability

gpt-oss:20b has sufficient 2D spatial recognition ability to navigate mazes.
With 80%+ accuracy, it can reach the goal within about 1.5x the shortest route.

Response time with list strategy is around 30 sec/cell at 11x11.
Not suitable for real-time processing, but practical for casual use with local LLM.

For comparison, I briefly tested two other models:

Model	Impression
gemma3:12b	~50% accuracy, not practical
deepseek-r1:14b	Not as good as gpt-oss:20b, but promising

I believe the Reasoning capability common to gpt-oss:20b and deepseek-r1:14b plays a significant role.

Interestingly, in my environment deepseek-r1:14b runs at 100% GPU, but gpt-oss:20b at 76% GPU is faster with better accuracy.

list + history = best

Prompt strategy significantly affects both accuracy and response time.

My first strategy was simple - I thought it would be intuitive for humans, but it was the worst.

Next I tried graph, a structured format for pathfinding.
It was faster than simple for small mazes, giving me hope, but response time degraded as size increased.
I believe this is due to the increase in adjacency information.

matrix is a structured version of simple, but didn't produce good results either.

I never expected list - a coordinate list that humans can't even interpret as a maze - to be the best.

Regarding history, I think the information about "how I got here" simply helps in deciding the next direction.
It's especially helpful for corridor types.
However, since it means more tokens to process, if only the last few steps are effective, there might be room for optimization.

This article and code were created in collaboration with Claude Code.
I also asked Claude Code to create the mazes for the experiment, but it struggled to create them accurately in simple format, so I had to manually adjust them quite a bit.
If even Claude Code struggles with simple format, perhaps it's no surprise that local LLMs do too.

Source code and experiment data are available on GitHub. Feel free to try it out if you're interested.

DEV Community