Sarav AK

Posted on Jun 20

Query YAML Like a Database — Why I Built YamlQL (And How It Works)

#ai #python #kubernetes #sql

Have you ever tried to grep through large set of Kubernetes YAML files just to figure out which pods are missing CPU limits?

I did.
And I hated it.

So I built YamlQL — a tool that lets you query YAML files using SQL.

YAMLQL has three mode of opeartions

Discover the schema of your YAML file
Run manual SQL queries over YAML
Use AI to generate SQL (schema-aware, no data sent)

YamlQL is a CLI + Python tool that converts YAML into DuckDB tables, so you can query it like a database.

😵‍💫 YAML is beautiful until it is not

YAML is beautiful for humans to write — but a nightmare to audit or analyze at scale. You’ve likely seen it everywhere:

• Docker Compose
• Kubernetes manifests
• Helm values
• GitHub Actions
• CircleCI, ArgoCD, and more

But try to ask simple questions like:

• “Which containers expose port 80?”
• “Where did we forget resources.limits.memory?”
• “Are any services still using HTTP?”

You’re stuck with grep, yq, or writing ad-hoc scripts that break when a field is missing or nested differently.

YAML as a language has various problems

The following article is a great summary of the problems with YAML:

https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell

🛠️ What I Built: YamlQL

YamlQL is a CLI + Python tool that converts YAML into DuckDB tables, so you can query it like a database.

✅ Key Features
• discover — See the schema of your YAML file
• sql — Run manual SQL queries over YAML
• ai — Use AI to generate SQL (schema-aware, no data sent)
• Supports nested structures, lists, dicts
• Works locally, offline, and fast

Let's see how YAMLQL works with an example

The Sample Deployment file

Lets consider the following kubernetes deployment manifest for an example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: complex-app
  labels:
    app: complex-app
    tier: backend
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: complex-app
  template:
    metadata:
      labels:
        app: complex-app
        version: v1
    spec:
      volumes:
        - name: config-volume
          configMap:
            name: app-config
        - name: secret-volume
          secret:
            secretName: app-secrets
        - name: shared-logs
          emptyDir: {}
      containers:
        - name: main-app
          image: myorg/main-app:latest
          ports:
            - containerPort: 8080
          env:
            - name: APP_ENV
              value: production
            - name: CONFIG_PATH
              value: /etc/config
            - name: SECRET_TOKEN
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: token
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: shared-logs
              mountPath: /var/log/app
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          resources:
            requests:
              cpu: "500m"
              memory: "256Mi"
            limits:
              cpu: "1"
              memory: "512Mi"
        - name: sidecar-logger
          image: fluent/fluentd:latest
          ports:
            - containerPort: 24224
          volumeMounts:
            - name: shared-logs
              mountPath: /fluentd/log
          env:
            - name: FLUENTD_CONF
              value: fluentd.conf
        - name: metrics-exporter
          image: prom/node-exporter
          ports:
            - containerPort: 9090
          resources:
            limits:
              cpu: "100m"
              memory: "128Mi"

Discover

Before writing the query - you need to know how this YAML file is converted as a table and its schema

So first we use the discover mode

yamlql discover deployment.yaml

🧠 Write SQL queries Manually

Now we know the Table and the field names and the Schema of this file - Let us put it to use

Lets write some SQL queries to get the data from YAML

👨‍💻 Write SQL queries with AI - without sharing your actual Data

As English has become the new programming language in the ERA of Software 3.0

Let us do some Vibe Code and write Natural Language Query which would be sent to AI along with the schema - without sharing the actual data

LLM is used here only for converting the NLP to SQL with schema as an input

🎥 Video Demo

https://www.youtube.com/watch?v=6MRYTz027Fc

🤖 YAML in RAG and AI Workflows

This started as a tool for my RAG pipelines.

I needed to:
• Ingest YAML-based metadata (Helm, K8s, config files)
• Normalize it
• Extract relevant structured data before embedding

YamlQL made it clean, SQL-native, and easy to scale.

✨ It’s Open Source

Find the sourcecode here and feel free to contribute and improve

https://github.com/AKSarav/YamlQL

Install with pip

pip install yamlql

Here are some example commands you can use

yamlql discover yourfile.yaml
yamlql sql yourfile.yaml --query "SELECT * FROM metadata"

I’d Love Your Feedback and contribution

What would make this more useful in your workflow?
What’s missing before you’d use this in CI/CD?
Would you want to see it in YAMLQL

Leave a comment, open an issue, or just ping me.

I’m building this in the open, and you hoping it would help someone and with your feedback and contribute this can go further.

Thanks
Sarav

Find me on LinkedIn
https://www.linkedin.com/in/aksarav/

DEV Community