The EU AI Act is an infrastructure problem, not a legal one

#opensource #ai #python #machinelearning

The EU AI Act doesn't ask whether you thought about risk. It asks you to show the evidence.

Article 9 requires a risk management system. Article 10 requires data governance with measurable quality criteria. Article 11 requires technical documentation that reflects the actual system. Article 12 requires automatic logging. Article 15 requires demonstrated accuracy and robustness.

All of these demand machine-readable artifacts generated from your actual pipeline — not a risk register in SharePoint.

Most organizations building high-risk AI systems today have slide decks where they should have evidence. And the deadline for high-risk systems is August 2, 2026.

This is not a legal problem. It's an infrastructure problem. And we built the plumbing to fix it.

The full lifecycle in real code

I'll walk through a complete compliance lifecycle — pre-training data audit, mitigation, post-training verification — using a credit scoring model. Not a toy. Real OSCAL policies, real fairness metrics, real evidence collection.

Venturalitica SDK is open-source, Apache 2.0. pip install venturalitica.

Step 1: Define compliance policy as code

Instead of a Word document listing controls, you write an OSCAL policy file. OSCAL is the NIST standard for compliance-as-code — the same format the US federal government uses for FedRAMP.

data_policy.oscal.yaml — Pre-training controls (Article 10: Data Governance):

assessment-plan:
  metadata:
    title: Credit Risk Data Policy
  control-implementations:
    - implemented-requirements:

        - control-id: credit-data-imbalance
          description: "Minority class >= 20% of dataset"
          props:
            - name: metric_key
              value: class_imbalance
            - name: threshold
              value: "0.2"
            - name: operator
              value: gt
            - name: "input:target"
              value: target

        - control-id: credit-data-bias
          description: "Gender disparate impact follows Four-Fifths Rule"
          props:
            - name: metric_key
              value: disparate_impact
            - name: threshold
              value: "0.8"
            - name: operator
              value: gt
            - name: "input:target"
              value: target
            - name: "input:dimension"
              value: gender

        - control-id: credit-age-disparate
          description: "Age disparate impact ratio > 0.5"
          props:
            - name: metric_key
              value: disparate_impact
            - name: threshold
              value: "0.50"
            - name: operator
              value: gt
            - name: "input:target"
              value: target
            - name: "input:dimension"
              value: age

model_policy.oscal.yaml — Post-training controls (Article 15: Accuracy & Robustness):

assessment-plan:
  metadata:
    title: "Article 15: Model Accuracy and Fairness"
  control-implementations:
    - implemented-requirements:

        - control-id: model-accuracy
          description: "Model accuracy >= 70%"
          props:
            - name: metric_key
              value: accuracy_score
            - name: threshold
              value: "0.70"
            - name: operator
              value: gte
            - name: "input:target"
              value: target
            - name: "input:prediction"
              value: prediction

        - control-id: model-gender-fairness
          description: "Demographic parity difference < 0.10"
          props:
            - name: metric_key
              value: demographic_parity_diff
            - name: threshold
              value: "0.10"
            - name: operator
              value: lt
            - name: "input:target"
              value: target
            - name: "input:prediction"
              value: prediction
            - name: "input:dimension"
              value: gender

These policies are version-controlled, diffable, reviewable in PRs. If your fairness threshold changes from 0.10 to 0.05, that's a git commit — not an update buried in a spreadsheet.

Step 2: Audit the training data BEFORE training

import venturalitica as vl
from venturalitica.quickstart import load_sample

df = load_sample("loan")  # UCI German Credit, 1000 samples

data_results = vl.enforce(
    data=df,
    target="class",
    gender="Attribute9",
    age="Attribute13",
    policy="data_policy.oscal.yaml"
)

for r in data_results:
    status = "PASS" if r.passed else "FAIL"
    print(f"  {r.control_id:<25} {r.actual_value:.3f}  {r.operator} {r.threshold}  {status}")

Output:

  credit-data-imbalance     0.429  gt 0.2   PASS
  credit-data-bias          0.818  gt 0.8   PASS
  credit-age-disparate      0.286  gt 0.5   FAIL

The age disparity fails. The dataset has a 71.4% gap between approval rates for younger vs older applicants. Article 10 requires you to detect this before training — not discover it during an audit 6 months later.

The enforce() engine evaluates 104 registered metrics across fairness (binary, multiclass, causal, intersectional), privacy (k-anonymity, l-diversity, t-closeness), data quality, and model performance. Each result includes group-level breakdowns — an auditor doesn't just see "passed" or "failed", they see that Female approval was 35.16% vs Male 27.68%.

Step 3: Train with mitigation, then audit the model

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X = df.select_dtypes(include=["number"]).drop(columns=["class"])
y = df["class"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
test_df = X_test.copy()
test_df["class"] = y_test
test_df["prediction"] = predictions

## Post-training audit against Article 15
model_results = vl.enforce(
    data=test_df,
    target="class",
    prediction="prediction",
    gender="Attribute9",
    policy="model_policy.oscal.yaml"
)

Now you have pre-training data audit and post-training model audit — both stored in .venturalitica/results.json. You can show an auditor: "here's the bias we found in the data, here's the mitigation we applied, and here's the measured outcome."

This is what Article 9 (risk management) looks like in practice: identify, mitigate, verify, document. As code.

The evidence vault: 7 probes, one line of code

Here's the core of it. Wrap your entire pipeline in vl.monitor():

with vl.monitor("loan_full_audit"):
    # --- Article 10: Data Audit ---
    data_results = vl.enforce(
        data=df, target="class",
        gender="Attribute9", age="Attribute13",
        policy="data_policy.oscal.yaml"
    )

    # --- Train ---
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)

    # --- Article 15: Model Audit ---
    model_results = vl.enforce(
        data=test_df, target="class",
        prediction="prediction", gender="Attribute9",
        policy="model_policy.oscal.yaml"
    )

That with vl.monitor() context manager activates 7 concurrent probes that silently collect evidence you'd otherwise need weeks to assemble:

Probe	What it captures	EU AI Act
TraceProbe	AST analysis of your code: functions called, libraries imported, model classes detected. Timestamped.	Art 11
ArtifactProbe	SHA-256 hashes of input data and output models. Proves data integrity between runs.	Art 10
BOMProbe	CycloneDX bill of materials — every library, version, and detected model class. Think `package-lock.json` for ML.	Art 11
IntegrityProbe	Environment fingerprint: OS, Python version, architecture. Detects drift between training and deployment.	Art 15
HardwareProbe	CPU count, peak memory. Documents compute constraints.	Art 11
CarbonProbe	kgCO2 emissions via CodeCarbon.	Art 11
HandshakeProbe	Verifies `enforce()` was actually called. Catches pipelines that skip compliance.	Art 9

Evidence lands in .venturalitica/runs/{run_id}/:

.venturalitica/runs/20260325_143022/
    trace_loan_full_audit.json    # AST + execution context
    results.json                   # All enforce() results with group breakdowns
    bom.json                       # CycloneDX SBOM

The trace proves what code ran. The BOM proves which versions were used. The artifact hashes prove data didn't change. The results prove controls were evaluated. This is the kind of evidence that survives an audit — because it was generated by the pipeline, not written by a human after the fact.

If you already use MLflow or WandB, the SDK auto-logs everything there too:

# Just set the env var — results flow to MLflow automatically
MLFLOW_TRACKING_URI=http://localhost:5000 python train.py

Compliance metrics appear in your existing model registry alongside accuracy and loss.

OSCAL: why we didn't invent our own format

Most AI governance tools invent a proprietary format. We chose OSCAL — the NIST standard for compliance-as-code.

Why? Because interoperability. If your evidence is locked in a proprietary dashboard, you've traded one PDF problem for a vendor lock-in problem. OSCAL policies can be consumed by IBM Compliance Trestle, GovReady-Q, RegScale, and dozens of others.

It's the same bet the industry made on SBOM formats (CycloneDX, SPDX) for software supply chain. Standards beat vendor lock-in.

What this is NOT

Not a GRC dashboard. We don't replace your legal team's risk assessment. We give them machine-readable evidence to work with.
Not a monitoring platform. Evidently and Giskard are great for drift detection and production testing. We complement them — Venturalitica turns monitoring outputs into regulatory artifacts.
Not a checkbox tool. The SDK generates evidence from your actual pipeline. If you want to fill a form and call it compliant, this isn't for you.

We sit in the gap between "ML engineer training a model" and "compliance officer needing proof."

Get started

pip install venturalitica

GitHub: Venturalitica/venturalitica-sdk — Apache 2.0
Full lifecycle walkthrough: Zero to Annex IV in 15 minutes
Real scenarios: Loan scoring, medical imaging, vision fairness, financial LLM
Discord: Join the community

If you're building AI systems in a regulated industry and want to contribute policy templates for your domain, PRs are welcome.

I'm Rodrigo, founder of Venturalitica. We're building the compliance infrastructure layer for AI — so evidence is a byproduct of building, not a separate exercise. If you're hitting the "how do we operationalize this" wall with the EU AI Act, I'd love to hear what's blocking you.