DEV Community

Njabulo Majozi
Njabulo Majozi

Posted on

AI powered PostgreSQL Test Data Generation Tool (Cloudflare AI Challenge)

This is a submission for the Cloudflare AI Challenge.

What I Built

I built a serverless application using Cloudflare Workers and Hono, a lightweight web framework for Cloudflare Workers. The application is designed to generate steps for inserting random data into a PostgreSQL database and then convert those steps into SQL queries. This is achieved by leveraging Cloudflare's AI models to understand and generate natural language instructions, which are then converted into SQL commands.

Demo

Image description

My Code

REPO

The core functionality lies in the generate-data endpoint:

  1. Extracting Schema: It retrieves the user-provided schema definition from the request body.
  2. Initializing AI Models: It creates instances of two AI models:
    • @hf/thebloke/deepseek-coder-6.7b-base-awq: This model understands natural language instructions and generates the steps in human-readable format.
    • @cf/defog/sqlcoder-7b-2: This model takes the steps and schema definition, translating them into corresponding SQL code.
  3. Prompting the Models
    • The first model receives a prompt explaining the desired outcome and the provided schema.
    • The second model receives the generated steps and the schema definition, combining the information for SQL generation.
  4. Returning Data: The function returns a JSON response containing the generated steps and the corresponding SQL code.

Journey

Building this application involved several steps, from understanding the requirements to implementing the solution.

  • Understanding Cloudflare Workers: I started by researching how to use Cloudflare Workers and Hono for serverless applications.
  • Exploring AI Models: I explored Cloudflare's AI models to find one that could generate natural language instructions based on a given schema.
  • Integration and Orchestration: I implemented the logic to process the generated instructions and convert them into SQL queries.

Multiple Models and/or Triple Task Types

  • Multiple Models

    The application demonstrates multiple AI models from Cloudflare's AI platform. The first model, @hf/thebloke/deepseek-coder-6.7b-base-awq, generates natural language steps for data insertion. The second model, @cf/defog/sqlcoder-7b-2, converts these steps into SQL queries. This showcases the flexibility and power of Cloudflare's AI platform in generating complex content based on simple prompts.

  • Triple Task Types

The application performs three main tasks:

  1. Data Generation: It generates natural language steps for inserting data into a PostgreSQL database based on a given schema.
  2. SQL Query Generation: It converts the generated steps into SQL queries.
  3. API Endpoint: It exposes an API endpoint (/generate-data) that accepts a schema and returns the generated steps and SQL queries.
  • Challenges:
    • Coordinating communication between the two LLMs.
    • Prompts designs.
    • Ensuring the generated SQL scripts are functional and adhere to the DDL and data constraints.

What I'm Proud Of

The ability to combine multiple LLMs to achieve a complex task like test data generation for databases.

What's Next

  • Experiment with different LLM combinations for improved performance.
  • Integrate user feedback to refine the generated test data scripts.
  • Local LLMs integration
  • Use LangChain SQL agent to retrieve database DDLs

Top comments (3)

Collapse
 
fyodorio profile image
Fyodor

That’s an interesting tech project, I love it 👏 this challenge lacks such dev-oriented projects, kudos for being one of them 👍

Collapse
 
njabulomajozi profile image
Njabulo Majozi

Thanks! It can be a helpful tool for quickly generating test data, as it is a pain point for devs. Are there any specific features that would be beneficial?

Collapse
 
fyodorio profile image
Fyodor

Nothing specific, I rarely work with SQL these days. But I think the best advantage of such tools is narrow focus on doing something in a dedicated and deep way, so I honestly doubt you need to expand the feature scope too much 👍