The Search for a Native Engine
As a developer who values performance and systems integrations, I felt limited by the common "Chat with DB" approach. That method often involves slow, external wrappers that pull data out of Postgres just to convert natural language into SQL. I wanted to know: why can't the database itself understand me??
My goal was a bit bold: to integrate AI directly into the Postgres kernel, making the database self-aware. This led me to a new domain, inspired by the ClickHouse open take-home challenge.
The Crucible: C/C++ Interoperability
The first major shock was realizing the true language of PostgreSQL extensions: C. This immediately created a conflict, as the powerful ClickHouse AI SDK I was required to use was written in C++.
The Conflict: C++ is incompatible with C headers, but Postgres only provides C headers.
The Magic: After some searching, I discovered the powerful solution: using the C/C++ mixing tool,
extern "C". This allowed my C++ code to safely import and link against the C headers of PostgreSQL. This foundational lesson clarified the true purpose of header files in modular software development.
The package manager problem
With modern languages like TypeScript, Python, and Go, we take package managers for granted. Suddenly, I was back in the C++ world, where I learned that third-party library integration must be done manually.
To solve this, I created a CMakeLists.txt file. This configuration file became the linker and builder for my third-party libraries—the equivalent of the package.json I was used to in the Node.js ecosystem. This streamlined the build process and ensured I could properly package the extension.
How to connect System Tables
The next major hurdle was performance. Initial thoughts (and even AI suggestions) pointed toward using standard SQL functions (SPI_connect()) to query the database catalog for schema information.
The Problem: Running SQL queries internally is slow and adds overhead.
The Solution: I found that by diving into the PostgreSQL documentation, it was possible to access the schema names, table names, and columns directly by opening internal system tables using headers provided by Postgres. This was a massive performance win. By accessing the system catalogs directly, I ensured the AI had the necessary "ground truth" for prompting without adding unnecessary query latency.
AI Engine: With Controlling output
To ensure the system was reliable, I needed the AI output to be stable and predictable.
Structured Output : I moved the core logic to structured JSON output, ensuring the SQL was returned in a single, predictable field. This abstracted the AI's "quotes or explanations" and made the application robust.
Performance and Stability: I set the temperature to 0 in the prompt to ensure the output was grounded solely in the provided system context, reducing the chance of irrelevant output.
Distribution to users: A fight with Dockerfile
I created a Docker image to test the extension safely, but the image size ballooned to 1.8GB.
The Problem : The initial build contained the entire development toolchain and libraries needed to compile the C++ extension.
The Solution: I created a
multi-stage build. The first stage compiled the extension, and the second, final stage took only the necessary runtime files (pg_ask.so,.control,.sql) and put them onto a fresh base Postgres image. This reduced the final image size back to a lean 500MB.
To try the extension
The final product, pg_ask, is a powerful extension that allows immediate use after a simple Docker command.
# Pull the image
docker pull ghcr.io/abiji-2020/pg_ask:latest
# Run with your API key
docker run -d \
--name pg_ask \
-e POSTGRES_PASSWORD=mysecretpassword \
-e PG_ASK_AI_KEY=your_api_key_here \
-p 5432:5432 \
ghcr.io/abiji-2020/pg_ask:latest
After the docker image started running we can query the database
-- Connect to your database
psql -h localhost -U postgres -d postgres
-- Create the extension
CREATE EXTENSION IF NOT EXISTS pg_ask;
-- Set up sample database to query
-- Example query (shows the power of the extension)
SELECT pg_gen_query('count all users created in the last 7 days');
Conclusion
This project taught me that "Full Stack" means understanding the system at every layer. My ability to go low-level with C++, manage performance with internal Postgres APIs, and solve deployment issues with multi-stage Docker builds are the core lessons. If you want to build truly performant, integrated AI tools, sometimes you have to stop building wrappers and start building extensions.
You can also check the Source of my extension at : https://github.com/Abiji-2020/pg_ask
Top comments (0)