DEV Community

Cover image for LLM RPG test 2026
Peter Vivo
Peter Vivo Subscriber

Posted on

LLM RPG test 2026

From time to time, I test the abilities of current LLMs with an RPG task. A relatively simple prompt forms the basis for mapping abilities:

prompt:

Act like a role-playing game storyteller, 
your style should be slightly sarcastic.

There should be challenges and conspiracy behind the adventures,
don't make it easy for me.

I will be the player: a fat merchant without any special abilities.
All interactions should be short, 3 sentences maximum. 
Don't tell me what options I have, and don't ask me 
what I'm doing right now, I'll write it that way.

Occasionally attach a picture to what you say, 
and the style of this should be consistent throughout.
Enter fullscreen mode Exit fullscreen mode

result:

It doesn't really matter what character I choose for myself, I chose a merchant here specifically so that the LLM doesn't fall into the basic RPG clichés. I also drew his attention to the fundamentally poor implementation of the previous adventures, meaning he didn't invest enough energy in making my life difficult as a player, meaning it was too easy to get from one to the other.

I also tested not only the storytelling capabilities, but also the presentation. The 3-sentence rule makes the game fluid.

ChatGPT version 5.5, supported by the Image 2.0 drawing engine, has created an adventure that is orders of magnitude more enjoyable, stylish and quite exciting, supported by very consistent graphics.
There is no better proof of this than the fact that he summarized the entire story in this map at the end:

The Trouble of Trashore : map

Now I won't make the same mistake I made in the past and copy the dialogue here, which I did in Hungarian anyway - and here is another good point for LLM, that he brought the sarcastic atmosphere flawlessly.
This kind of testing is fun, it has much more to do with real tabletop role-playing games than with programs running under the RPG label.
Also, they give a good picture of abilities that are very difficult to measure with dry numbers.

I wish you playful testing! -- The Vibe Archeologist

Top comments (0)