The Hidden Cost of Using JSON in LLM Prompts

#discuss #ai #programming #machinelearning

I was given a task to build a chatbot that would help the user select services from a salon based on requirements and try to provide any offers if present, and also fetch and provide time slots for the given service.

Initially, we were providing the full JSON dump of services, linking categories, price, duration, add-ons, staff, etc., all together in a flat list. It did well, and recommendations were also on point many times.

But then hallucinations started to happen.

In just a few runs, the bot started to present facts that were never in the description.
It recommended services that did not match the user preference.
The second phase was even worse. Random dates, not asking proper questions related to customer budget, incorrect duration info, etc.

I tried to solve it by converting the price and duration to function calls at least, so phase 2 got much better, though time slots were still an issue. Usually, time slots were large and could contain up to 20–50 slots depending on the number of staff and how big the service was. Large flat lists like these also tend to overload the model’s short-term reasoning, making it more likely to mix up constraints.

To solve this issue, I tried to look for various things. As our list was flat, I thought reducing it by using toon format might work. But after that too, there wasn't much of an improvement, though the token reduction was nice. While fewer tokens helped with cost, it didn’t significantly improve reasoning quality.

I wanted something that would require fewer tokens (when scaling, each token count matters a lot) and also something that is easy for the LLM to understand. Deeply nested JSON structures often inflate token count due to repeated keys, braces, and commas, while also increasing the chance of bracket or key-matching errors.

I was trying to deploy an AWS Lambda and was writing a YAML config for it. That's when it struck me, why not use YAML! It has way less syntax, JSON can be easily converted into YAML, and it even supports nested data. YAML typically results in 10–25% fewer tokens than equivalent JSON for deeply nested data. Those who don't know YAML (YAML Ain’t Markup Language, yes that's its full form 😆) is a human-friendly way to write structured data using indentation instead of brackets and braces. It’s easier to read and edit than JSON, especially for configs and prompts. Indentation-based structure also reduces bracket-matching mistakes, which LLMs commonly make under long contexts. LLMs tend to “understand” YAML better because it preserves semantic grouping and looks closer to natural language.

Next time you work on a project where there are many keys or the data is nested deeply, I highly recommend trying YAML. Maybe it will give you the perfect boost you wanted!

DEV Community

The Hidden Cost of Using JSON in LLM Prompts

Top comments (0)