synthaicode

Posted on Dec 23, 2025

When Documentation Fails: Brute-Force Specification Discovery with AI

#ai #testing #development #architecture

The Problem: Undocumented Behavior

I needed to port a LINQ-based query library from ksqlDB to Apache Flink SQL.

The challenge wasn't the code. The challenge was that neither platform fully documents what works, what doesn't, and what the alternatives are when something fails.

ksqlDB has LEN(s). Flink doesn't—it uses CHAR_LENGTH(s).
ksqlDB has DATEADD. Flink uses TIMESTAMPADD or interval arithmetic.
ksqlDB's JSON_EXTRACT_STRING becomes Flink's JSON_VALUE.
Some functions exist in both but behave differently.

Documentation covers the happy path. Production needs the complete map.

The Human Approach: Unsustainable

Manual testing would look like this:

Pick a function
Write a test query
Run against Flink
Record success/failure
If failed, search for alternatives
Repeat for every function × data type × clause combination

For 50+ functions, 10+ data types, and 5 clause contexts (SELECT, WHERE, GROUP BY, HAVING, JOIN), you're looking at thousands of combinations.

Time required: weeks.
Patience required: superhuman.

The AI Approach: Brute-Force Discovery

I gave AI a simple directive:

Investigate which queries are usable in Flink SQL.
Test combinations of data types and functions.
Cover SELECT, WHERE, GROUP BY, HAVING, and JOIN.

That's it. No detailed test plan. No enumeration of cases.

AI generated the combinations, executed them against a Dockerized Flink environment, recorded results, and when something failed, explored alternatives.

The Result: A Dialect Mapping

After systematic probing, a comprehensive mapping emerged:

Function	ksqlDB	Flink	Status
String length	`LEN(s)`	`CHAR_LENGTH(s)`	ksqlDB form NG
String split	`SPLIT(s, d)`	`SPLIT_INDEX(s, d, i)`	ksqlDB form NG
Date add	`DATEADD(unit, n, ts)`	`TIMESTAMPADD(UNIT, n, ts)`	ksqlDB form NG
JSON extract	`JSON_EXTRACT_STRING`	`JSON_VALUE`	ksqlDB form NG
Regex match	`REGEXP_LIKE`	`SIMILAR TO`	ksqlDB form NG
Padding	`LPAD/RPAD`	`LPAD/RPAD`	OK
Null handling	`COALESCE/NULLIF`	`COALESCE/NULLIF`	OK
Safe cast	N/A	`TRY_CAST`	Flink-only

And edge cases no documentation mentions:

JSON_QUERY returns NULL for array element access in certain environments
ESCAPE '\\' in LIKE clauses fails; use ESCAPE '^' instead
Array indexing is 1-based; arr[0] throws an error
SESSION windows work in streaming mode but fail in batch
Reserved words as aliases (AS Values) cause parse errors

Speed: The Decisive Advantage

This wasn't just about coverage. It was about speed.

Approach	Time	Coverage
Manual	Weeks	Partial, fatigue-limited
AI-driven	Hours	Exhaustive

AI doesn't get tired. AI doesn't skip edge cases because it's Friday afternoon. AI runs the 47th variation of a JSON function test with the same diligence as the first.

The speed advantage isn't incremental—it's categorical.

What This Changes

Traditional E2E testing assumes you know the specification and are verifying implementation.

This inverts the model: E2E as specification discovery.

When external systems have incomplete documentation, AI-powered brute-force testing becomes the fastest path to ground truth.

The Human Role

AI handled the combinatorial explosion. My role was:

Define the axes: data types, functions, clause contexts
Provide the environment: Dockerized Flink for execution
Judge the results: OK/NG/alternative mappings
Make design decisions: which patterns to support, which to fail-fast

The findings directly informed the dialect abstraction layer in my library—knowing exactly where ksqlDB and Flink diverge enables clean separation at compile time rather than runtime surprises.

Conclusion

When documentation fails, brute-force wins.

AI transforms E2E testing from a verification activity into a discovery activity. The combinatorial explosion that makes exhaustive human testing impossible becomes AI's natural operating mode.

Don't test what you know. Discover what you don't.

This testing approach was developed during the creation of Kafka.Context.Streaming, a query abstraction layer supporting multiple SQL dialects.

DEV Community