DEV Community

Cover image for When Documentation Fails: Brute-Force Specification Discovery with AI
synthaicode
synthaicode

Posted on

When Documentation Fails: Brute-Force Specification Discovery with AI

The Problem: Undocumented Behavior

I needed to port a LINQ-based query library from ksqlDB to Apache Flink SQL.

The challenge wasn't the code. The challenge was that neither platform fully documents what works, what doesn't, and what the alternatives are when something fails.

  • ksqlDB has LEN(s). Flink doesn't—it uses CHAR_LENGTH(s).
  • ksqlDB has DATEADD. Flink uses TIMESTAMPADD or interval arithmetic.
  • ksqlDB's JSON_EXTRACT_STRING becomes Flink's JSON_VALUE.
  • Some functions exist in both but behave differently.

Documentation covers the happy path. Production needs the complete map.

The Human Approach: Unsustainable

Manual testing would look like this:

  1. Pick a function
  2. Write a test query
  3. Run against Flink
  4. Record success/failure
  5. If failed, search for alternatives
  6. Repeat for every function × data type × clause combination

For 50+ functions, 10+ data types, and 5 clause contexts (SELECT, WHERE, GROUP BY, HAVING, JOIN), you're looking at thousands of combinations.

Time required: weeks.
Patience required: superhuman.

The AI Approach: Brute-Force Discovery

I gave AI a simple directive:

Investigate which queries are usable in Flink SQL.
Test combinations of data types and functions.
Cover SELECT, WHERE, GROUP BY, HAVING, and JOIN.

That's it. No detailed test plan. No enumeration of cases.

AI generated the combinations, executed them against a Dockerized Flink environment, recorded results, and when something failed, explored alternatives.

The Result: A Dialect Mapping

After systematic probing, a comprehensive mapping emerged:

Function ksqlDB Flink Status
String length LEN(s) CHAR_LENGTH(s) ksqlDB form NG
String split SPLIT(s, d) SPLIT_INDEX(s, d, i) ksqlDB form NG
Date add DATEADD(unit, n, ts) TIMESTAMPADD(UNIT, n, ts) ksqlDB form NG
JSON extract JSON_EXTRACT_STRING JSON_VALUE ksqlDB form NG
Regex match REGEXP_LIKE SIMILAR TO ksqlDB form NG
Padding LPAD/RPAD LPAD/RPAD OK
Null handling COALESCE/NULLIF COALESCE/NULLIF OK
Safe cast N/A TRY_CAST Flink-only

And edge cases no documentation mentions:

  • JSON_QUERY returns NULL for array element access in certain environments
  • ESCAPE '\\' in LIKE clauses fails; use ESCAPE '^' instead
  • Array indexing is 1-based; arr[0] throws an error
  • SESSION windows work in streaming mode but fail in batch
  • Reserved words as aliases (AS Values) cause parse errors

Speed: The Decisive Advantage

This wasn't just about coverage. It was about speed.

Approach Time Coverage
Manual Weeks Partial, fatigue-limited
AI-driven Hours Exhaustive

AI doesn't get tired. AI doesn't skip edge cases because it's Friday afternoon. AI runs the 47th variation of a JSON function test with the same diligence as the first.

The speed advantage isn't incremental—it's categorical.

What This Changes

Traditional E2E testing assumes you know the specification and are verifying implementation.

This inverts the model: E2E as specification discovery.

When external systems have incomplete documentation, AI-powered brute-force testing becomes the fastest path to ground truth.

The Human Role

AI handled the combinatorial explosion. My role was:

  1. Define the axes: data types, functions, clause contexts
  2. Provide the environment: Dockerized Flink for execution
  3. Judge the results: OK/NG/alternative mappings
  4. Make design decisions: which patterns to support, which to fail-fast

The findings directly informed the dialect abstraction layer in my library—knowing exactly where ksqlDB and Flink diverge enables clean separation at compile time rather than runtime surprises.

Conclusion

When documentation fails, brute-force wins.

AI transforms E2E testing from a verification activity into a discovery activity. The combinatorial explosion that makes exhaustive human testing impossible becomes AI's natural operating mode.

Don't test what you know. Discover what you don't.


This testing approach was developed during the creation of Kafka.Context.Streaming, a query abstraction layer supporting multiple SQL dialects.

Top comments (0)