Have you ever tried to grep through large set of Kubernetes YAML files just to figure out which pods are missing CPU limits?
I did.
And I hated it.
So I built YamlQL — a tool that lets you query YAML files using SQL.
YAMLQL has three mode of opeartions
- Discover the schema of your YAML file
- Run manual SQL queries over YAML
- Use AI to generate SQL (schema-aware, no data sent)
YamlQL is a CLI + Python tool that converts YAML into DuckDB tables, so you can query it like a database.
😵💫 YAML is beautiful until it is not
YAML is beautiful for humans to write — but a nightmare to audit or analyze at scale. You’ve likely seen it everywhere:
• Docker Compose
• Kubernetes manifests
• Helm values
• GitHub Actions
• CircleCI, ArgoCD, and more
But try to ask simple questions like:
• “Which containers expose port 80?”
• “Where did we forget resources.limits.memory?”
• “Are any services still using HTTP?”
You’re stuck with grep
, yq
, or writing ad-hoc scripts that break when a field is missing or nested differently.
YAML as a language has various problems
The following article is a great summary of the problems with YAML:
https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell
🛠️ What I Built: YamlQL
YamlQL is a CLI + Python tool that converts YAML into DuckDB tables, so you can query it like a database.
✅ Key Features
• discover — See the schema of your YAML file
• sql — Run manual SQL queries over YAML
• ai — Use AI to generate SQL (schema-aware, no data sent)
• Supports nested structures, lists, dicts
• Works locally, offline, and fast
Let's see how YAMLQL works with an example
The Sample Deployment file
Lets consider the following kubernetes deployment manifest for an example
apiVersion: apps/v1
kind: Deployment
metadata:
name: complex-app
labels:
app: complex-app
tier: backend
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
replicas: 3
selector:
matchLabels:
app: complex-app
template:
metadata:
labels:
app: complex-app
version: v1
spec:
volumes:
- name: config-volume
configMap:
name: app-config
- name: secret-volume
secret:
secretName: app-secrets
- name: shared-logs
emptyDir: {}
containers:
- name: main-app
image: myorg/main-app:latest
ports:
- containerPort: 8080
env:
- name: APP_ENV
value: production
- name: CONFIG_PATH
value: /etc/config
- name: SECRET_TOKEN
valueFrom:
secretKeyRef:
name: app-secrets
key: token
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: shared-logs
mountPath: /var/log/app
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
- name: sidecar-logger
image: fluent/fluentd:latest
ports:
- containerPort: 24224
volumeMounts:
- name: shared-logs
mountPath: /fluentd/log
env:
- name: FLUENTD_CONF
value: fluentd.conf
- name: metrics-exporter
image: prom/node-exporter
ports:
- containerPort: 9090
resources:
limits:
cpu: "100m"
memory: "128Mi"
Discover
Before writing the query - you need to know how this YAML file is converted as a table and its schema
So first we use the discover mode
yamlql discover deployment.yaml
🧠 Write SQL queries Manually
Now we know the Table and the field names and the Schema of this file - Let us put it to use
Lets write some SQL queries to get the data from YAML
👨💻 Write SQL queries with AI - without sharing your actual Data
As English has become the new programming language in the ERA of Software 3.0
Let us do some Vibe Code and write Natural Language Query which would be sent to AI along with the schema
- without sharing the actual data
LLM is used here only for converting the NLP to SQL with schema as an input
🎥 Video Demo
https://www.youtube.com/watch?v=6MRYTz027Fc
🤖 YAML in RAG and AI Workflows
This started as a tool for my RAG pipelines.
I needed to:
• Ingest YAML-based metadata (Helm, K8s, config files)
• Normalize it
• Extract relevant structured data before embedding
YamlQL made it clean, SQL-native, and easy to scale.
✨ It’s Open Source
Find the sourcecode here and feel free to contribute and improve
https://github.com/AKSarav/YamlQL
Install with pip
pip install yamlql
Here are some example commands you can use
yamlql discover yourfile.yaml
yamlql sql yourfile.yaml --query "SELECT * FROM metadata"
I’d Love Your Feedback and contribution
- What would make this more useful in your workflow?
- What’s missing before you’d use this in CI/CD?
- Would you want to see it in YAMLQL
Leave a comment, open an issue, or just ping me.
I’m building this in the open, and you hoping it would help someone and with your feedback and contribute this can go further.
Thanks
Sarav
Find me on LinkedIn
https://www.linkedin.com/in/aksarav/
Top comments (0)