Why I Built This
Large‑language‑model agents love data. Give them access to your enterprise warehouse and they’ll start generating SQL faster than any analyst.
That’s exciting — until an agent decides to run a DELETE FROM across an entire dataset or a multi‑terabyte query that costs hundreds of dollars.
To keep LLMs productive but safe, you need a BigQuery client with built‑in constraints, cost checks, and auditability.
Meet agentic‑bq.
What It Is
agentic‑bq is an agent‑safe BigQuery client that injects common‑sense guardrails for AI‑driven data access.
pip install agentic-bq
Features
The agentic‑bq library delivers a complete safety layer for AI agents that interact with BigQuery. It enforces parameterized queries to eliminate risky string concatenation and SQL injection, ensuring that every query uses bound parameters for predictability and security. A powerful denylist engine automatically blocks destructive SQL verbs like DROP, DELETE, ALTER, or TRUNCATE, shielding enterprise datasets from unintended modifications. To prevent resource exhaustion, agentic‑bq injects or overrides a LIMIT N clause on any SELECT statement, keeping query sizes manageable. Before execution, it performs a dry‑run cost check to estimate bytes processed and prevent agents from triggering expensive scans, giving you clear visibility into potential spend. For downstream orchestration, results are returned in clean, agent‑readable JSON, making it easy to chain outputs into other LLM tools or workflows. Finally, the client offers audit‑ready structured logging, enabling full traceability and compliance reporting whenever an agent issues a query. Together, these features turn BigQuery into a controlled, cost‑aware, and secure environment for agentic AI operations.
Getting Started
pip install agentic-bq
from agentic_bq import AgenticBQ
bq = AgenticBQ(project="my-gcp-project")
query = """
SELECT name, total_sales
FROM retail_dataset.sales
WHERE region = @region
ORDER BY total_sales DESC
"""
params = {"region": "US"}
result = bq.safe_query(query, params=params, limit=100)
print(result.to_json())
Use cases
The agentic‑bq library unlocks several practical use cases for organizations building AI agents that interact with enterprise data warehouses like BigQuery.
A LLM data assistant can safely execute parameterized SELECT queries without risking SQL injection or schema damage, allowing natural‑language agents to explore structured data securely. For FinOps‑aware data agents, agentic‑bq’s built‑in dry‑run checks estimate bytes processed before execution, so expensive queries can be blocked or re‑routed to summary tables, protecting cloud budgets in real time. Compliance data bots benefit from the library’s audit‑ready logging and denylist enforcement—every query structure, parameter, and cost estimate can be recorded automatically for governance or internal review. Finally, teams exposing enterprise analytics APIs can front BigQuery with an agentic‑bq layer to ensure every API‑generated query follows consistent safety, cost, and logging policies—giving external or internal agents controlled, policy‑compliant access to corporate data.
In a single call:
- dynamic parameters are safely bound,
- a LIMIT 100 is injected if missing,
- any forbidden statements trigger an exception,
- a dry‑run is performed to measure cost (bytes processed),
- then the job executes only if safe.
Under the Hood
- Wraps Google Cloud BigQuery Python Client
- Uses BigQuery’s QueryJobConfig(dry_run=True) for cost estimation
- Enforces SQL‑level regex guards before submission
- Applies automatic row limits and parameter binding
- Exposes results via pandas, JSON, or Pydantic‑style objects
- Supports async execution for agent pipelines
Built for LLM Agents
When integrated into an agent framework (LangChain, CrewAI, AutoGen):
tool("bq_query", bq.safe_query, description="Run cost‑controlled BigQuery SQL")
Agents can then generate queries freely within your safety envelope.
You keep cost, security, and data integrity under control.
Example: Dry‑Run Validation
info = bq.dry_run(
"SELECT COUNT(*) FROM massive_table"
)
print(f"Estimated bytes processed: {info.estimated_bytes_processed/1e9:.2f} GB")
Output before execution might read:
- Estimated bytes processed: 3.25 GB
- That number lets you cap budgets or throttle agent jobs dynamically.
Configuration
bq = AgenticBQ(
project="my-project",
max_cost_gb=5, # deny if > 5 GB processed
enforce_limit=200, # default limit
denylist=["DELETE", "DROP", "UPDATE"],
log_dir="/var/log/agentic_bq"
)
Security Model
- No ad‑hoc string execution
- Fully parameterized queries
- Pre‑execution dry runs to detect cost risk
- Optional IAM role binding per agent service account
Design Principles
- Least Privilege for Data Agents
- Predictable Cost Profiles
- Composability – works as a drop‑in LangChain tool
- Transparency – logs intent before execution
Publishing (Developers)
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade build twine
python -m build
twine upload --repository testpypi dist/*
twine upload dist/*
Future Roadmap
- v0.1 – Parameter binding + LIMIT enforcement
- v0.2 – Async API support
- v0.3 – Adaptive budgeting via BigQuery reservations API
- v0.4 – LLM query explain‑plan visualizer
- v1.0 – Production stability and OpenTelemetry metrics
Closing Thought
- LLMs are evolving into full‑blown data agents.
- With agentic‑bq, you can let them explore BigQuery freely — without risking your budget, your data, or your sleep.
pip install agentic-bq
Top comments (0)