How Data Formatting (Line Breaks and Indentation) Affects LLM Response Accuracy in RAG

#programming #ai #rag #llm

(This is an English translation of my original Japanese article: 日本語版はこちら)

In slack-explorer-mcp, instead of returning permalink URLs for each message in the response, I have the AI Agent on the client side construct them. This is because including permalinks for every message would consume a significant amount of tokens. Since permalinks can be reconstructed from other data already provided, I omit them and let the client side build them to save tokens. However, this approach has been inconsistent—sometimes it works well, sometimes it doesn't.

Currently, slack-explorer-mcp returns large responses in one-line JSON format. I wondered if changing to formatted JSON with line breaks and indentation (which is more human-readable) would also improve accuracy for LLMs.

So this time, I evaluated how data formatting methods (presence or absence of line breaks and indentation) affect LLM response accuracy. Please note that the test case volume and data size for this evaluation are limited (100 list items, about 16 test cases), so consider these results as a reference.

Experiment location: shibayu36/playground/extract-slack-url-eval

Conclusion

From this evaluation, I found three things:

Formatting with line breaks and indentation doesn't particularly improve accuracy
The stronger the LLM model, the less impact formatting has on accuracy
For RAG, it's best to choose the most token-efficient format

Incidentally, I recently came across discussions that there are differences in response accuracy based on data format, but these differences diminish as LLM models become stronger. While those discussions evaluate differences between data formats like Markdown, CSV, and JSONL (rather than the presence of line breaks and indentation), they seem to reach similar conclusions.

Evaluation Method

I used a tool called promptfoo for the evaluation.

First, I prepared dummy data mimicking the message search responses returned by slack-explorer-mcp as RAG test data. I created three format variations for this data:

oneline JSON: JSON format without line breaks or indentation
pretty JSON: JSON format with line breaks and indentation
CSV: Comma-separated CSV format. Included to check performance with a token-efficient format that includes line breaks

Here's an example of the actual dummy data. For details, see messages-pretty.json.

{
    "workspace_url": "https://example.slack.com",
    "messages": {
        "matches": [
            {
                "user": "U012ABC3DEF",
                "text": "Sharing detailed investigation results on BigQuery materialized view deduplication\\n\\nFrom the investigation, we found the following:",
                "ts": "1756096495.765749",
                "channel": {
                    "id": "C01ABC2DE",
                    "name": "general"
                }
            },
            {
                "user": "U023DEF4GHI",
                "text": "Got it!",
                "ts": "1756096501.234567",
                "channel": {
                    "id": "C01ABC2DE",
                    "name": "general"
                },
                "thread_ts": "1756096495.765749"
            },
            ...

I designed the prompt to include information for generating Slack permalinks and output only the permalink URL:

Your role is to construct the permalink URL for a single Slack message specified by the user from Slack message data. Use the information below to construct the permalink URL.

# How to construct the permalink
Response includes workspace_url, channel.id, and ts (timestamp) which can be used to construct Slack permalinks:
- Regular message (no thread_ts field): {workspace_url}/archives/{channel.id}/p{ts without dot}
- Thread reply (has thread_ts field): Same URL with ?thread_ts={thread_ts}&cid={channel.id}

# Output format
Output only the permalink URL.

# Slack message information
{{RAG data}}

# Let's begin
Please construct the Slack permalink URL according to the instructions below.

{{input}}

I prepared 16 test cases. For example, here are some test case inputs. See here for all test cases.

I want the URL for the 3rd message
URL of the message discussing BigQuery

Additionally, for the evaluation models, I selected both the latest models and slightly older models to observe differences due to model performance:

gpt-5-2025-08-07 (latest)
gpt-4.1-mini-2025-04-14 (slightly older small model)
claude-sonnet-4-5-20250929 (latest)
claude-3-7-sonnet-20250219 (slightly older)

Results

The accuracy rates across the 16 test cases were as follows:

Model	oneline JSON	pretty JSON	CSV
gpt-5	100% (16/16)	100% (16/16)	100% (16/16)
gpt-4.1-mini	62.50% (10/16)	43.75% (7/16)	68.75% (11/16)
claude-sonnet-4-5	62.50% (10/16)	68.75% (11/16)	75% (12/16)
claude-3-7-sonnet	56.25% (9/16)	56.25% (9/16)	43.75% (7/16)

gpt-5 achieved an impressive 100% accuracy rate across all formats, showing no difference based on format. For other models, the format with the highest accuracy varied.

From these results, I confirmed that adding line breaks and indentation doesn't improve accuracy, and furthermore, as models become stronger, the differences due to formatting disappear.

Summary

From this evaluation, I found that when passing list data to LLMs in RAG, formatting for readability with line breaks and indentation doesn't particularly improve accuracy. Additionally, as LLM models become stronger, differences due to formatting disappear. Since gpt-5 already achieved 100% accuracy, this trend will likely continue.

Paradoxically, these results suggest that for RAG, it's best to choose the most token-efficient format. In the RAG data used this time: