DEV Community

noear
noear

Posted on

Snack4 JSON Streaming Parser & Auto-Repair Deep Guide

Snack4 is a high-performance Java JSON library designed for demanding scenarios. Its core component JsonReader uses a non-recursive state machine architecture, natively supporting Streaming Parse and Auto-Repair, making it ideal for handling unstable LLM outputs and large NDJSON files.

1. Streaming Parse: From Bulk to Incremental

Traditional parsers require JSON to be complete and loaded into memory at once. Snack4 supports extracting valid data in chunks (Nodes) while the data is flowing.

1.1 Core Scenarios

  • LLM Real-time Rendering: Parse while generating, update UI without waiting for the conversation to end.
  • NDJSON/JSONL: Process multi-line JSON objects from log streams or database exports.
  • Low Memory Processing: When processing GB-level files, memory usage stays at KB-level buffer size.

1.2 Key APIs

JsonReader reader = new JsonReader(jsonStream);

// Method A: Iterator pattern (recommended, most elegant syntax)
for (ONode node : reader.iterableNext()) {
    process(node); 
}

// Method B: Precisely get the last complete node (commonly used for LLM final state)
ONode finalState = reader.readLast();

// Method C: Manual chunked reading control
ONode next;
while ((next = reader.readNext()) != null) {
    // business logic
}
Enter fullscreen mode Exit fullscreen mode

2. Auto-Repair: Fault Tolerance at Its Best

When Feature.Read_AutoRepair is enabled, JsonReader transforms from a "strict validator" into a "smart completer". By maintaining an internal type stack, it automatically infers missing symbols when encountering unexpected endings.

2.1 Repair Logic Reference

Damage Type Original Input (Broken) Repaired Output Repair Logic
Keyword Truncation {"status": tru {"status": true} Incomplete true literal auto-repaired to true
Unclosed Container {"a":{"b":1 {"a":{"b":1}} Auto-completes missing } based on parsing stack
Trailing Comma [1, 2, [1, 2] Automatically ignores trailing comma
Missing Key-Value {"key": {"key": null} Found colon then directly ended, auto-completes value
Invalid Character Interference {"a":1} #comment {"a":1} Works with Read_AllowComment to filter non-JSON content

2.2 Code Demo

// Even deeply nested and severely truncated strings
String brokenJson = "{\"user\":{\"tags\":[\"java\",\"ai\""; 

Options opts = Options.of(Feature.Read_AutoRepair);
ONode node = JsonReader.read(brokenJson, opts);

System.out.println(node.toJson());
// Output: {"user":{"tags":["java","ai"]}}
Enter fullscreen mode Exit fullscreen mode

3. Advanced: Best Practices for Handling LLM Mixed Output

LLMs sometimes output text followed by a JSON, or multiple separate JSON blocks. Using Snack4's combination features, you can easily handle these complex scenarios.

3.1 Mixed Stream Parse Template

public void handleLlmStream(String rawOutput) {
    // Enable auto-repair + allow single quotes (increased robustness)
    Options opts = Options.of(Feature.Read_AutoRepair, Feature.Read_AllowSingleQuotes);
    JsonReader reader = new JsonReader(rawOutput, opts);

    // Auto-filter non-valid JSON parts from stream (like leading text)
    for (ONode node : reader.iterableNext()) {
        if (node.isObject() || node.isArray()) {
            // Only process structured data
            dispatchToBusiness(node);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

4. Summary

Dimension Streaming & Auto-Repair Normal Mode
Parse Method Streaming pointer sliding, continuously parsing json chunks Parses only one json chunk, throws exception for extras
Exception Handling Auto-repairs, no exceptions thrown Throws ParseException on missing colon
LLM Adaptation Natively supports truncation, keyword repair Requires developers to write regex or preprocessing (non-compliant throws ParseException)

By combining Streaming Parse with Auto-Repair, Snack4 provides developers with a robust data defense line, ensuring that no matter how bad the upstream data is, downstream business logic receives stable structured objects.

Top comments (0)