The EU AI Act doesn't ask whether you thought about risk. It asks you to show the evidence.
Article 9 requires a risk management system. Article 10 requires data governance with measurable quality criteria. Article 11 requires technical documentation that reflects the actual system. Article 12 requires automatic logging. Article 15 requires demonstrated accuracy and robustness.
All of these demand machine-readable artifacts generated from your actual pipeline — not a risk register in SharePoint.
Most organizations building high-risk AI systems today have slide decks where they should have evidence. And the deadline for high-risk systems is August 2, 2026.
This is not a legal problem. It's an infrastructure problem. And we built the plumbing to fix it.
The full lifecycle in real code
I'll walk through a complete compliance lifecycle — pre-training data audit, mitigation, post-training verification — using a credit scoring model. Not a toy. Real OSCAL policies, real fairness metrics, real evidence collection.
Venturalitica SDK is open-source, Apache 2.0. pip install venturalitica.
Step 1: Define compliance policy as code
Instead of a Word document listing controls, you write an OSCAL policy file. OSCAL is the NIST standard for compliance-as-code — the same format the US federal government uses for FedRAMP.
data_policy.oscal.yaml — Pre-training controls (Article 10: Data Governance):
assessment-plan:
metadata:
title: Credit Risk Data Policy
control-implementations:
- implemented-requirements:
- control-id: credit-data-imbalance
description: "Minority class >= 20% of dataset"
props:
- name: metric_key
value: class_imbalance
- name: threshold
value: "0.2"
- name: operator
value: gt
- name: "input:target"
value: target
- control-id: credit-data-bias
description: "Gender disparate impact follows Four-Fifths Rule"
props:
- name: metric_key
value: disparate_impact
- name: threshold
value: "0.8"
- name: operator
value: gt
- name: "input:target"
value: target
- name: "input:dimension"
value: gender
- control-id: credit-age-disparate
description: "Age disparate impact ratio > 0.5"
props:
- name: metric_key
value: disparate_impact
- name: threshold
value: "0.50"
- name: operator
value: gt
- name: "input:target"
value: target
- name: "input:dimension"
value: age
model_policy.oscal.yaml — Post-training controls (Article 15: Accuracy & Robustness):
assessment-plan:
metadata:
title: "Article 15: Model Accuracy and Fairness"
control-implementations:
- implemented-requirements:
- control-id: model-accuracy
description: "Model accuracy >= 70%"
props:
- name: metric_key
value: accuracy_score
- name: threshold
value: "0.70"
- name: operator
value: gte
- name: "input:target"
value: target
- name: "input:prediction"
value: prediction
- control-id: model-gender-fairness
description: "Demographic parity difference < 0.10"
props:
- name: metric_key
value: demographic_parity_diff
- name: threshold
value: "0.10"
- name: operator
value: lt
- name: "input:target"
value: target
- name: "input:prediction"
value: prediction
- name: "input:dimension"
value: gender
These policies are version-controlled, diffable, reviewable in PRs. If your fairness threshold changes from 0.10 to 0.05, that's a git commit — not an update buried in a spreadsheet.
Step 2: Audit the training data BEFORE training
import venturalitica as vl
from venturalitica.quickstart import load_sample
df = load_sample("loan") # UCI German Credit, 1000 samples
data_results = vl.enforce(
data=df,
target="class",
gender="Attribute9",
age="Attribute13",
policy="data_policy.oscal.yaml"
)
for r in data_results:
status = "PASS" if r.passed else "FAIL"
print(f" {r.control_id:<25} {r.actual_value:.3f} {r.operator} {r.threshold} {status}")
Output:
credit-data-imbalance 0.429 gt 0.2 PASS
credit-data-bias 0.818 gt 0.8 PASS
credit-age-disparate 0.286 gt 0.5 FAIL
The age disparity fails. The dataset has a 71.4% gap between approval rates for younger vs older applicants. Article 10 requires you to detect this before training — not discover it during an audit 6 months later.
The enforce() engine evaluates 104 registered metrics across fairness (binary, multiclass, causal, intersectional), privacy (k-anonymity, l-diversity, t-closeness), data quality, and model performance. Each result includes group-level breakdowns — an auditor doesn't just see "passed" or "failed", they see that Female approval was 35.16% vs Male 27.68%.
Step 3: Train with mitigation, then audit the model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X = df.select_dtypes(include=["number"]).drop(columns=["class"])
y = df["class"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
test_df = X_test.copy()
test_df["class"] = y_test
test_df["prediction"] = predictions
## Post-training audit against Article 15
model_results = vl.enforce(
data=test_df,
target="class",
prediction="prediction",
gender="Attribute9",
policy="model_policy.oscal.yaml"
)
Now you have pre-training data audit and post-training model audit — both stored in .venturalitica/results.json. You can show an auditor: "here's the bias we found in the data, here's the mitigation we applied, and here's the measured outcome."
This is what Article 9 (risk management) looks like in practice: identify, mitigate, verify, document. As code.
The evidence vault: 7 probes, one line of code
Here's the core of it. Wrap your entire pipeline in vl.monitor():
with vl.monitor("loan_full_audit"):
# --- Article 10: Data Audit ---
data_results = vl.enforce(
data=df, target="class",
gender="Attribute9", age="Attribute13",
policy="data_policy.oscal.yaml"
)
# --- Train ---
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# --- Article 15: Model Audit ---
model_results = vl.enforce(
data=test_df, target="class",
prediction="prediction", gender="Attribute9",
policy="model_policy.oscal.yaml"
)
That with vl.monitor() context manager activates 7 concurrent probes that silently collect evidence you'd otherwise need weeks to assemble:
| Probe | What it captures | EU AI Act |
|---|---|---|
| TraceProbe | AST analysis of your code: functions called, libraries imported, model classes detected. Timestamped. | Art 11 |
| ArtifactProbe | SHA-256 hashes of input data and output models. Proves data integrity between runs. | Art 10 |
| BOMProbe | CycloneDX bill of materials — every library, version, and detected model class. Think package-lock.json for ML. |
Art 11 |
| IntegrityProbe | Environment fingerprint: OS, Python version, architecture. Detects drift between training and deployment. | Art 15 |
| HardwareProbe | CPU count, peak memory. Documents compute constraints. | Art 11 |
| CarbonProbe | kgCO2 emissions via CodeCarbon. | Art 11 |
| HandshakeProbe | Verifies enforce() was actually called. Catches pipelines that skip compliance. |
Art 9 |
Evidence lands in .venturalitica/runs/{run_id}/:
.venturalitica/runs/20260325_143022/
trace_loan_full_audit.json # AST + execution context
results.json # All enforce() results with group breakdowns
bom.json # CycloneDX SBOM
The trace proves what code ran. The BOM proves which versions were used. The artifact hashes prove data didn't change. The results prove controls were evaluated. This is the kind of evidence that survives an audit — because it was generated by the pipeline, not written by a human after the fact.
If you already use MLflow or WandB, the SDK auto-logs everything there too:
# Just set the env var — results flow to MLflow automatically
MLFLOW_TRACKING_URI=http://localhost:5000 python train.py
Compliance metrics appear in your existing model registry alongside accuracy and loss.
OSCAL: why we didn't invent our own format
Most AI governance tools invent a proprietary format. We chose OSCAL — the NIST standard for compliance-as-code.
Why? Because interoperability. If your evidence is locked in a proprietary dashboard, you've traded one PDF problem for a vendor lock-in problem. OSCAL policies can be consumed by IBM Compliance Trestle, GovReady-Q, RegScale, and dozens of others.
It's the same bet the industry made on SBOM formats (CycloneDX, SPDX) for software supply chain. Standards beat vendor lock-in.
What this is NOT
- Not a GRC dashboard. We don't replace your legal team's risk assessment. We give them machine-readable evidence to work with.
- Not a monitoring platform. Evidently and Giskard are great for drift detection and production testing. We complement them — Venturalitica turns monitoring outputs into regulatory artifacts.
- Not a checkbox tool. The SDK generates evidence from your actual pipeline. If you want to fill a form and call it compliant, this isn't for you.
We sit in the gap between "ML engineer training a model" and "compliance officer needing proof."
Get started
pip install venturalitica
- GitHub: Venturalitica/venturalitica-sdk — Apache 2.0
- Full lifecycle walkthrough: Zero to Annex IV in 15 minutes
- Real scenarios: Loan scoring, medical imaging, vision fairness, financial LLM
- Discord: Join the community
If you're building AI systems in a regulated industry and want to contribute policy templates for your domain, PRs are welcome.
I'm Rodrigo, founder of Venturalitica. We're building the compliance infrastructure layer for AI — so evidence is a byproduct of building, not a separate exercise. If you're hitting the "how do we operationalize this" wall with the EU AI Act, I'd love to hear what's blocking you.
Top comments (0)