AI is writing our code... but who is auditing the AI?

#typescript #softwareengineering #responsibleai #npm

Hey Techies!

We all have certain steps to structuring and developing the code, ensuring it meets the business requirements. We structure patterns according to our project demands whether it's an application for an e-commerce site or a WordPress site. Code patterns can be of different types but developing code with certain patterns and asserting them in a way to handle edge cases, negative and positive cases improves the reliability in a better way. Handling millions of modules in a microservice architecture will be challenging and AI can support the developers to provide reliable code with good error management.

We all know the drill: scaling microservices with millions of modules is a nightmare for reliability. We need edge cases, negative tests, and perfect assertions, but who has the time?

This is where LLM Orchestration comes in. LLM orchestrated to support assertions of a code, that helps us to improve the code to handle low error rates and high efficiency. But I didn't want to build just another AI wrapper. The tool that assists us with this is ts-genai-test , which uses AST-based code analysis to automatically generate optimized Jest unit tests. It doesn't just call an AI; it calculates code complexity to route our requests to the most efficient LLM provider (Gemini, OpenAI, or Groq), saving us the costs while maximizing test accuracy.

So, AI writes the test cases with strictly engineered prompts, but who is auditing the auditor?

We need some assistance to audit the LLM suitable to run tests for the desired functions. An AST-based analyser builds a rapport in analysing the complexity of code to ensure the right function follows the right LLM to maintain efficient AI resources, with the input of the file path, file, and function inside the file. The library follows the code changes under staged changes to analyse the functions and files and runs the prompt to generate the ready-to-written test case code with defined import module syntax based on the exports like named and default functions.

Complexity Computation acts as not just a tool but a system of accountability to use AI resources. AI mode policy based on configuration named AI_OVERRIDE_POLICY that can accept suggest | auto | never to have a model selector override the user-defined model selection. Model selector acts as a digital supervisor, having logic combined with user preference like low-cost, high accuracy or balanced to decide the LLM based on the complexity score of a function to be asserted.

Complexity score categorized as above 60, range of 25-60, and below 25. AI model is chosen if user config doesn’t include the model or if the model has been configured with override policy as auto. Test tool generator chooses the recommended model; else, the user would be provided with suggestions to choose a model based on complexity score.

Low Complexity (< 25): Routes to gemini-1.5-flash (Low-cost).
Medium Complexity (25-60): Routes to gpt-4o-mini (Balanced).
High Complexity (> 60): Routes to gpt-4o (High-accuracy).

The Hallucination Problem: Hardening the Output

Let’s be honest: AI hallucinates. It guesses imports, misses function names, and sometimes generates code that looks right but doesn't run. In ts-genai-test tool, we don’t just hope for the best. We use Syntactic Hardening:

Deterministic Imports: We use AST analysis to tell the AI exactly how to import your functions. We don’t have to guess.
Strict Persona: The AI is locked into an Expert QA mode with hard constraints, no markdown, no conversational fluff, just raw TypeScript.
The Score of Truth: If a test doesn't pass the Jest suite, it’s flagged in our metrics. We close the loop between AI Generation and Real-world Execution.

With internal MetricsRunner, every run is audited. Users get a success rate, pass/fail counts, and actual coverage percentages. It’s not just AI-generated; it’s AI-accountable. The output json file would be in the user's project root folder level (refer to the image below).