Shizuku

Posted on Feb 24

Beyond Artifact-Only Evaluation: A Case for Development-Session Attestation (DSA)

#ai #devops #security #softwareengineering

Introduction

AI-assisted development has dramatically increased implementation speed.

At the same time, I think it has made one thing harder: relying on artifact-only evaluation as a sufficient signal of engineering work (repositories, portfolio sites, demos, finished UI screenshots, etc.).

Artifacts still matter. Code review, testing, CI, and running systems will remain essential.

The problem is not that artifacts became useless.

The problem is that artifacts alone often do not answer questions like:

How much of the implementation was actually understood?
What tools/processes were executed during development?
Is the development process itself explainable after the fact?

This becomes especially important in situations where implementation output is not the whole story:

incident response and quality explanation
hiring assessments
constrained test environments (e.g., prohibited tools)
training and education contexts

This is the gap I am trying to frame with DSA (Development-Session Attestation):

an evidence-and-verification layer for the development session itself, not just the final output.

If you want to see a concrete implementation first:

GitHub (SessionAttested): https://github.com/shizuku198411/SessionAttested

DSA concept note (in repo): DSA.md

PoC workspace example with WebUI screenshots: attested_poc/README.md

What This Article Is (and Isn’t)

This is not an eBPF/LSM implementation deep dive.

I already have implementation-oriented material around SessionAttested, which is one implementation path toward DSA.

This article is about the higher-level idea:

why artifact-only evaluation is becoming less sufficient
what DSA is trying to add
how it relates to existing auditing/provenance approaches
how SessionAttested fits into that picture

What Is Changing in Artifact-Centric Evaluation

Artifact-centric evaluation is not “wrong.” It is simply becoming less complete in some scenarios.

1) Portfolio artifacts are becoming easier to produce without corresponding understanding

With modern AI tooling and developer assistants, it is increasingly possible to produce polished outputs without deep implementation understanding.

That does not mean the artifact is fake.

It means the artifact alone may no longer be enough to evaluate:

implementation comprehension
engineering judgment
ability to explain trade-offs and failure handling

2) Verifying prohibited tool usage is difficult in practice

In hiring exercises, training, competitions, or controlled environments, people may want to restrict certain tools.

The practical challenge is not only “detection,” but making the result explainable at the level of a development session and its commits.

Common monitoring approaches often leave gaps:

endpoint-wide monitoring is noisy
network monitoring does not map cleanly to file/commit changes
shell history does not capture all delegated/internal process activity (bash, node, helper processes, IDE internals)

3) Suspicious or unintended process activity during development is hard to explain later

IDE extensions, helper binaries, dependency/tool updates, and delegated subprocesses can all introduce process activity that is difficult to reason about after the fact.

The core problem is often not:

“Can we block everything?”

but rather:

“Can we later explain what ran, what wrote to the workspace, and what commit it relates to?”

DSA (Development-Session Attestation): What I Mean by It

By DSA, I mean:

an approach/framework for treating development sessions (the process of development) as evidence that can be collected, bound, verified, and reviewed — in addition to final artifacts

In other words, DSA extends the evaluation surface from:

artifact only

to:

artifact + development-session evidence

What Questions DSA Should Help Answer

A DSA-capable system should make it easier to answer questions like:

What processes executed in this development session?
Which executable identities wrote to the workspace?
Which files were changed, and what process lineage touched them?
Which commit(s) can this session be linked to?
Did a session violate policy (e.g., prohibited tools)?
Can these claims be verified later?

This is the part that is often missing if we only look at outputs or only look at CI/build provenance.

DSA Is an Evidence and Verification Layer, Not a Skill Scoring System

This distinction matters.

DSA is not:

an automatic “skill score” engine
a full replacement for code review or testing
a universal proof of behavior outside the audited environment

DSA is primarily about:

evidence collection
evidence binding
verification
reviewability

How that evidence is interpreted (hiring policy, education policy, compliance rules, etc.) remains an organizational decision.

Keeping this separation is important; otherwise the discussion quickly collapses into “monitoring” or “tool banning” only.

How DSA Relates to Existing Approaches

DSA is best understood as a complementary layer, not a replacement.

EDR/XDR / Host Auditing

Strong at:

endpoint-wide process visibility

Less strong at:

session-scoped, commit-linked, developer-facing explainability

Network Monitoring (FW / Proxy / NDR)

Strong at:

traffic visibility

Less strong at:

process/file/commit linkage for development work

CI / Provenance / Supply Chain Attestation

Strong at:

build/release provenance
artifact integrity and pipeline traceability

Less strong at:

developer-session provenance (what happened while code was being authored)

Code Review / Testing

Strong at:

output quality and behavior validation

Less strong at:

process evidence (what tools/processes actually ran)

In that sense, DSA sits in a gap between endpoint auditing and CI provenance:

development-session process evidence

A Practical Layer Model for DSA

To make DSA implementation-oriented (and not only conceptual), I find it useful to split it into layers:

Collection
- session evidence (exec, workspace writes, identities)
Binding
- link session evidence to commits/repositories
Verification
- policy checks, signature checks, integrity checks
Review
- human-facing UI/reports/artifacts
Evaluation Policy
- organizational interpretation and usage of evidence

This layering helps separate:

what the tooling should implement
what organizations/teams should define as policy

Where SessionAttested Fits

SessionAttested is one implementation path toward DSA.

It currently provides a concrete stack for:

host-side auditing of dev-container sessions
executable identity aggregation (exec / writer)
commit binding
signed attestations and verification
review-oriented outputs and WebUI

So while it can be used for “AI agent detection” policies, I think the more durable framing is:

SessionAttested is a DSA-oriented evidence and verification foundation.

What DSA Changes in Practice (Realistically)

DSA does not magically solve engineering evaluation.

What it can do is improve the quality of evidence available for review.

For example:

reviewing artifact + development-session evidence together
treating prohibited tool non-observation as a verifiable claim (within a managed session)
making incident/quality discussions more evidence-based (“what ran in this session?”)

That is a meaningful shift even before any “scoring” or large-scale standardization exists.

Limits (Important)

DSA should be discussed with explicit limits, otherwise it becomes over-claiming.

At least in the current framing (and in SessionAttested’s current implementation), DSA does not prove:

that no prohibited tools were used outside the audited environment/session
code quality or architecture quality by itself
a person’s skill level as a standalone conclusion

It provides:

high-confidence, verifiable process evidence for managed development sessions

That framing is narrower, but much more defensible.

Why I Think DSA Is Worth Exploring

I see DSA as a way to update our mental model of engineering evaluation in the AI-assisted era.

Not by abandoning artifact review, but by adding a process-evidence layer where it matters.

That is the motivation behind SessionAttested, and why I think “development-session attestation” is a useful concept to name and refine.

References

SessionAttested (GitHub): https://github.com/shizuku198411/SessionAttested
DSA concept note (repo): DSA.md
SessionAttested README: README.md
PoC example (VS Code forbidden-tool comparison + WebUI): attested_poc/README.md

DEV Community

Beyond Artifact-Only Evaluation: A Case for Development-Session Attestation (DSA)

Introduction

What This Article Is (and Isn’t)

What Is Changing in Artifact-Centric Evaluation

1) Portfolio artifacts are becoming easier to produce without corresponding understanding

2) Verifying prohibited tool usage is difficult in practice

3) Suspicious or unintended process activity during development is hard to explain later

DSA (Development-Session Attestation): What I Mean by It

What Questions DSA Should Help Answer

DSA Is an Evidence and Verification Layer, Not a Skill Scoring System

How DSA Relates to Existing Approaches

EDR/XDR / Host Auditing

Network Monitoring (FW / Proxy / NDR)

CI / Provenance / Supply Chain Attestation

Code Review / Testing

A Practical Layer Model for DSA

Where SessionAttested Fits

What DSA Changes in Practice (Realistically)

Limits (Important)

Why I Think DSA Is Worth Exploring

References

Top comments (0)