DEV Community

toydev
toydev

Posted on

2D Spatial Recognition with Local LLM: Comparing Prompt Strategies

1. Introduction

My GPU was occupied by LLM experiments throughout the 2025-2026 winter break, but it's finally free now!

I investigated the 2D spatial recognition ability of a local LLM (gpt-oss:20b) using mazes as the subject.

  • Motivation: Wanted to understand how LLMs perceive space for autonomous navigation
  • Method: Ask "which direction next?" for each cell with structured output
  • Prompts: Tried multiple strategies since I didn't know what works best
  • Source & Results: Published on GitHub

Result: The prompt I initially thought of turned out to be the worst.

Key Findings

  • Local LLM can handle 2D spatial recognition - gpt-oss:20b achieved sufficient accuracy
  • Prompt strategy makes a big difference - Response time varies by several times

2. Experiment Setup

Environment

  • OS: Windows 11 / WSL2 (Ubuntu)
  • CPU: AMD Ryzen 7 7700
  • GPU: GeForce RTX 4070 (12GB VRAM)
  • LLM Runtime: Ollama
  • Experiment Code: Node.js + TypeScript + @langchain/ollama 1.1.0

Model

Used gpt-oss:20b. Recommended VRAM is 16GB, but it runs on 12GB with CPU offloading (24% CPU / 76% GPU).

Prompt Strategies

Compared 4 strategies (see links for prompt output examples):

simple - ASCII visualization of maze

#####
#S#G#
# # #
#   #
#####
Enter fullscreen mode Exit fullscreen mode

matrix - Binary matrix for walls/paths

[[1,1,1,1,1],[1,0,1,0,1],[1,0,1,0,1],[1,0,0,0,1],[1,1,1,1,1]]
Enter fullscreen mode Exit fullscreen mode

list - List of walkable coordinates

["(1,1)","(3,1)","(1,2)","(3,2)","(1,3)","(2,3)","(3,3)"]
Enter fullscreen mode Exit fullscreen mode

graph - Adjacency list format

{"1,1":["1,2"],"1,2":["1,1","1,3"],"1,3":["1,2","2,3"],...}
Enter fullscreen mode Exit fullscreen mode

Mazes

Used 4 sizes (5x5 to 15x15) x 2 categories (maze list):

corridor - Walled passages

straight branch dead-end loop spiral
straight branch dead-end loop spiral

open - Open spaces with obstacles

empty pass detour
empty pass detour

Black=wall, White=path, Green=Start, Red=Goal

History Option

History refers to the path taken to reach the current cell (e.g., (1,1) -> (1,2) -> (2,2)).

  • With: Include history in prompt
  • Without: Exclude history from prompt

Evaluation Method

For each cell in the maze, ask "which direction should I go next?" and record success/failure and response time.

A correct answer is defined as any direction that gets closer to the goal. It doesn't need to be the shortest route.


3. Results

Each combination was run once. Consider this as reference data for observing trends.

Scale Verification

Results from testing all sizes x all strategies on representative mazes (corridor_straight / open_empty), with history enabled.

Accuracy (%)

Size simple matrix list graph
5x5 100 100 100 100
7x7 97 100 100 100
11x11 82 96 98 95
15x15 - - 95 89

Response Time (sec/cell)

Size simple matrix list graph
5x5 29 19 12 12
7x7 77 31 16 17
11x11 313 75 31 64
15x15 - - 41 190

15x15 matrix/simple were abandoned due to time constraints.

list is fastest and most accurate. The gap widens as size increases. simple degraded to 313 sec/cell (5+ minutes) at 11x11.

Effect of History

Comparing history on/off with list strategy at 11x11 (category averages).

Accuracy (%)

Category No History With History
corridor 82 86
open 99 100

Response Time (sec/cell)

Category No History With History
corridor 230 110
open 29 26

For corridor types, history enabled is about 2x faster. Open types show little difference.


4. Conclusion

gpt-oss:20b's 2D Spatial Recognition Ability

gpt-oss:20b has sufficient 2D spatial recognition ability to navigate mazes.
With 80%+ accuracy, it can reach the goal within about 1.5x the shortest route.

Response time with list strategy is around 30 sec/cell at 11x11.
Not suitable for real-time processing, but practical for casual use with local LLM.

For comparison, I briefly tested two other models:

Model Impression
gemma3:12b ~50% accuracy, not practical
deepseek-r1:14b Not as good as gpt-oss:20b, but promising

I believe the Reasoning capability common to gpt-oss:20b and deepseek-r1:14b plays a significant role.

Interestingly, in my environment deepseek-r1:14b runs at 100% GPU, but gpt-oss:20b at 76% GPU is faster with better accuracy.

list + history = best

Prompt strategy significantly affects both accuracy and response time.

My first strategy was simple - I thought it would be intuitive for humans, but it was the worst.

Next I tried graph, a structured format for pathfinding.
It was faster than simple for small mazes, giving me hope, but response time degraded as size increased.
I believe this is due to the increase in adjacency information.

matrix is a structured version of simple, but didn't produce good results either.

I never expected list - a coordinate list that humans can't even interpret as a maze - to be the best.

Regarding history, I think the information about "how I got here" simply helps in deciding the next direction.
It's especially helpful for corridor types.
However, since it means more tokens to process, if only the last few steps are effective, there might be room for optimization.


This article and code were created in collaboration with Claude Code.
I also asked Claude Code to create the mazes for the experiment, but it struggled to create them accurately in simple format, so I had to manually adjust them quite a bit.
If even Claude Code struggles with simple format, perhaps it's no surprise that local LLMs do too.

Source code and experiment data are available on GitHub. Feel free to try it out if you're interested.

Top comments (0)