The Problem: Undocumented Behavior
I needed to port a LINQ-based query library from ksqlDB to Apache Flink SQL.
The challenge wasn't the code. The challenge was that neither platform fully documents what works, what doesn't, and what the alternatives are when something fails.
- ksqlDB has
LEN(s). Flink doesn't—it usesCHAR_LENGTH(s). - ksqlDB has
DATEADD. Flink usesTIMESTAMPADDor interval arithmetic. - ksqlDB's
JSON_EXTRACT_STRINGbecomes Flink'sJSON_VALUE. - Some functions exist in both but behave differently.
Documentation covers the happy path. Production needs the complete map.
The Human Approach: Unsustainable
Manual testing would look like this:
- Pick a function
- Write a test query
- Run against Flink
- Record success/failure
- If failed, search for alternatives
- Repeat for every function × data type × clause combination
For 50+ functions, 10+ data types, and 5 clause contexts (SELECT, WHERE, GROUP BY, HAVING, JOIN), you're looking at thousands of combinations.
Time required: weeks.
Patience required: superhuman.
The AI Approach: Brute-Force Discovery
I gave AI a simple directive:
Investigate which queries are usable in Flink SQL.
Test combinations of data types and functions.
Cover SELECT, WHERE, GROUP BY, HAVING, and JOIN.
That's it. No detailed test plan. No enumeration of cases.
AI generated the combinations, executed them against a Dockerized Flink environment, recorded results, and when something failed, explored alternatives.
The Result: A Dialect Mapping
After systematic probing, a comprehensive mapping emerged:
| Function | ksqlDB | Flink | Status |
|---|---|---|---|
| String length | LEN(s) |
CHAR_LENGTH(s) |
ksqlDB form NG |
| String split | SPLIT(s, d) |
SPLIT_INDEX(s, d, i) |
ksqlDB form NG |
| Date add | DATEADD(unit, n, ts) |
TIMESTAMPADD(UNIT, n, ts) |
ksqlDB form NG |
| JSON extract | JSON_EXTRACT_STRING |
JSON_VALUE |
ksqlDB form NG |
| Regex match | REGEXP_LIKE |
SIMILAR TO |
ksqlDB form NG |
| Padding | LPAD/RPAD |
LPAD/RPAD |
OK |
| Null handling | COALESCE/NULLIF |
COALESCE/NULLIF |
OK |
| Safe cast | N/A | TRY_CAST |
Flink-only |
And edge cases no documentation mentions:
-
JSON_QUERYreturns NULL for array element access in certain environments -
ESCAPE '\\'in LIKE clauses fails; useESCAPE '^'instead - Array indexing is 1-based;
arr[0]throws an error -
SESSIONwindows work in streaming mode but fail in batch - Reserved words as aliases (
AS Values) cause parse errors
Speed: The Decisive Advantage
This wasn't just about coverage. It was about speed.
| Approach | Time | Coverage |
|---|---|---|
| Manual | Weeks | Partial, fatigue-limited |
| AI-driven | Hours | Exhaustive |
AI doesn't get tired. AI doesn't skip edge cases because it's Friday afternoon. AI runs the 47th variation of a JSON function test with the same diligence as the first.
The speed advantage isn't incremental—it's categorical.
What This Changes
Traditional E2E testing assumes you know the specification and are verifying implementation.
This inverts the model: E2E as specification discovery.
When external systems have incomplete documentation, AI-powered brute-force testing becomes the fastest path to ground truth.
The Human Role
AI handled the combinatorial explosion. My role was:
- Define the axes: data types, functions, clause contexts
- Provide the environment: Dockerized Flink for execution
- Judge the results: OK/NG/alternative mappings
- Make design decisions: which patterns to support, which to fail-fast
The findings directly informed the dialect abstraction layer in my library—knowing exactly where ksqlDB and Flink diverge enables clean separation at compile time rather than runtime surprises.
Conclusion
When documentation fails, brute-force wins.
AI transforms E2E testing from a verification activity into a discovery activity. The combinatorial explosion that makes exhaustive human testing impossible becomes AI's natural operating mode.
Don't test what you know. Discover what you don't.
This testing approach was developed during the creation of Kafka.Context.Streaming, a query abstraction layer supporting multiple SQL dialects.
Top comments (0)