DEV Community

prakashmehta@97
prakashmehta@97

Posted on

🦎 Project Chameleon: The Self-Describing Data Engine powered by Gemma 4 🧠

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
I built Auto-Dictate AI, an intelligent Autonomous Data Dictionary Agent.

The goal of this project was to solve the problem of "Data Swamps"β€”large, complex datasets that lack documentation, making them inaccessible to analytics teams. By leveraging the new Apache 2.0 licensed Gemma 4, this agent automates the discovery of data types, semantic meanings, and cross-table relationships. It runs entirely on private infrastructure, ensuring that sensitive metadata and schema structures never leave the secure environment while providing frontier-level reasoning on complex data architectures.

Demo
[Insert Link to your Demo Video or GIF here]

Code
The full source code, including the implementation details for the Gemma 4 integration and the schema crawling logic, is available on GitHub:

πŸ‘‰ GitHub Repository: [Insert your GitHub Link Here]

How I Used Gemma 4
For this project, I chose the Gemma 4 31B Dense model.

Why this model?

Reasoning Power: I utilized the 31B Dense model’s massive leap in reasoning (scoring ~89% on AIME 2026) to handle the complex task of Semantic Inference. It doesn't just see a column named tx_ref; it reasons that it represents a "Transaction Reference ID" based on surrounding table context.

Context Window: The 256K context window was a game-changer. It allowed the agent to ingest the entire DDL (Data Definition Language) of a multi-table database in a single prompt. This enabled the model to identify "Hidden Relationships" (Foreign Keys) that weren't explicitly defined in the code.

Deployment: I served the model using vLLM, taking advantage of the Shared KV Cache to maintain high throughput while the agent crawled through hundreds of table schemas.

Key Feature: I implemented the <|think|> token protocol. This allows the agent to "deliberate" on the potential business logic of a column before finalizing the description, significantly reducing "hallucinated" definitions and improving technical accuracy.

Technical Implementation & UX
Creativity: Instead of a static PDF, Auto-Dictate generates a dynamic Knowledge Graph where users can click on nodes to see Gemma-generated summaries.

Usability: I built a minimalist CLI and Web Dashboard that allows a Data Engineer to point the agent at a database URL and receive a full Markdown dictionary in minutes.

Top comments (0)