DEV Community

Vijay K Joseph
Vijay K Joseph

Posted on

The Zero-Trust Data Stack: Building Deterministic Agents with Google ADK

The "Trusted Analyst" Problem

When business stakeholders ask, “What was our Q3 revenue?” a hallucinating agent can invent numbers or misread columns. In enterprise contexts that becomes legal/financial risk, not a cute demo problem. We need answers that are: correct, explainable, reproducible, and auditable. ADK is designed for building code-first multi-agent systems that make these properties achievable.
DEV Community
+1

The Architecture: A 6-Agent Assembly Line

Design idea: split responsibility across small, well-specified agents. Each agent has a single purpose and uses deterministic tools whenever possible.
Agents (6):

  1. Ingestor — accepts file/database input, infers schema, profiles data (row counts, types, nulls).
  2. Schema & Typing Validator — verifies column types, mappings (e.g., date, currency, customer_id), rejects ambiguous or malformed inputs; produces a canonical schema.
  3. Deterministic Query Generator — produces SQL using templates and a deterministic engine (no freeform LLM SQL). Uses parameterized templates, typed placeholders, and a SQL AST builder.
  4. Executor (Safe Runner) — runs SQL queries in a sandboxed environment (read-only, statement limits, sampling) and returns raw results plus execution plan, row counts, and checksums.
  5. Statistical Checker / Data Profiler — runs deterministic checks on results (aggregates vs expected ranges, null ratios, duplicates, joins cardinality checks), flags anomalies.
  6. Synthesizer/Reporter (+ Provenance) — creates the human answer, attaches provenance (original query, SQL text, execution plan, checksums, dataset snapshot index), and a confidence score with specific reason string(s).

Visualizing the Flow

(Imagine a left-to-right pipeline.)

The "Secret Sauce": Deterministic Tools

  • The single biggest defense vs hallucination is remove nondeterminism where possible. Examples:

  • SQL templating + AST builders: Build queries by composing AST nodes (no freeform LLM text). This guarantees syntactic correctness and limits injection / bad reads.

  • Linters and static analyzers (e.g., SQLFluff or a SQL AST validator) to ensure queries follow best practices.

  • Deterministic transform functions for parsing dates/numbers: use typed parsers with fallback rules, not heuristic prose.

  • Unit tests & snapshot tests: For every analytic endpoint, run a deterministic suite (e.g., expected aggregate for a fixed seed dataset).

  • Checksums and sampling: the executor returns a hash of the resultset (or top-N canonical sample) so consumers can verify the same result is being used.

Handling the "Messy Reality" of Data

Real data is rarely clean. Practical parts to implement:

  • Auto profiling: detect suspicious columns (e.g., numeric stored as string), outlier rates, rate of parsing failures.

  • Column mapping UI / schema prompts: when the validator finds ambiguity (e.g., “is this column revenue or discount?”), surface a small verification step for a human-in-loop or a concise verification prompt.

  • Robust parsing rules: explicit locale/format rules for dates, currencies, thousands separators; fallback strategies that are logged.

  • Progressive aggregation: for very large datasets, run sampling + incremental aggregations and compare to full-run results.

  • Explainable imputations: if imputing missing data, log method and effect size (e.g., “imputed 12% of revenue rows with median per-customer”).

  • Escape hatches: allow a user to pin schema or upload a small canonical sample to avoid repeated ambiguity.

The "Evaluator" Loop: AI Checking AI

An evaluator agent runs a suite of automated checks after the initial run:

  1. Replay deterministic steps: re-run SQL templates with the same parameters and confirm checksum matches the Executor’s output.

  2. Sanity tests: compare top-line aggregates against last known stable values (e.g., month-over-month bounds), check for impossible values (negative revenue), ensure join cardinalities make sense.

  3. Counterfactual queries: run alternate queries that test assumptions (e.g., run with and without particular filters, or with an extra grouping to see if aggregation collapses).

  4. Explainability check: ensure every synthetic sentence references one or more provenance items (e.g., “Q3 revenue = $X (SQL: …, checksum: …)”).

  5. Score and veto: if checks fail, the evaluator returns a veto with a clear failure reason and suggested remedial action (human review, improved schema mapping, or re-run with different filters).

Top comments (0)