Write notes the way you always do — structure comes out afterwards

#ai #productivity #science #architecture

If you're going to record what you tried, what you really want to write is something close to a flowchart: what you used, how you tried it, what came out. Keep that machine-readable and you can replay or compare runs later. The value is obvious.

But writing a flowchart for every attempt is heavy work. While you're actually trying things, you'd rather scribble a sentence. Pure prose, on the other hand, is hard to pull machine-readable structure out of afterwards. Can I write loose prose and still get structured provenance out of it? — that's roughly the question I had in mind for Graphium.

Take a sentence like "Dissolved 5 g NaCl in 80 °C water, obtained a clear solution." I'd like to write that as prose, and still extract afterwards: what was the material, what was the condition, what was the output.

Older Graphium was "one label per block." Input block: NaCl. Parameter block: 80 °C. Output block: clear solution. Provenance was clean — but from the writer's side, a single sentence had to be broken apart into four entry fields. It felt less like an experiment note and more like filling out a structured form.

Two perspectives meeting in the same note

What's actually happening here is the question of whether two different demands can coexist in the same note.

One demand is to leave behind a graph-shaped record. If the data survives in a form you can search, aggregate, and compare later, replay and review become much easier. From a data-science perspective, this is the starting point.

The other demand is to capture what's happening in front of you, without losing it. Following the flow of thought and hand, write it down naturally as prose. The traditions of recording trial and error — lab notebooks, recipe notes, sketchbooks — all lean this way, valuing a kind of in-the-moment improvisation.

To satisfy both demands in the same place, you need something to bridge them. I put that bridge in the grammar of the document. Writing already comes with grammar — headings, paragraphs, nouns, verbs — and the distinction between act and object is naturally embedded in it. Place labels along that grammar, and the writer is just writing prose, while the reader gets a graph for free. That bridge — that's what convinced me to rebuild around grammar.

In v0.5.0 I restructured along this principle into three layers: Section (headings) / Phase (plan vs result) / Inline (in-text highlights).

Mapping document grammar onto PROV-DM

The idea is small: pin PROV-DM's ontology onto the grammar of writing.

PROV-DM	Grammar	Graphium
Activity (verb / clause)	Heading hierarchy	Section — h1/h2/h3
※ PROV-DM extension (`graphium:phase`)	Sub-heading	Phase — `[Plan]` / `[Result]`
Entity / Attribute (noun / phrase)	In-text term	Inline — `[Input]` / `[Tool]` / `[Parameter]` / `[Output]`

The middle row is the odd one out. Phase is not a standard PROV-DM concept — Graphium adds it as a custom attribute (graphium:phase). The intent behind it comes later in this post.

Verbs become headings, nouns become inline highlights. Hold to that mapping and you can write prose as prose, and provenance falls out for free.

That earlier sentence, under the heading "Dissolving NaCl":

[Input]NaCl[/] [Parameter]5 g[/] was dissolved in [Parameter]80 °C[/] [Input]water[/] to give a [Output]clear solution[/].

Almost identical to the original — a few small spans of color added. Underneath, Section creates an Activity, Inline creates Entities (NaCl, water, clear solution) and Attributes (5 g, 80 °C), and prov:used / prov:wasGeneratedBy are wired automatically.

Inline highlights also work inside bullet lists, not just prose. You can dump conditions as a quick list and add highlights afterward, or write everything as flowing prose and highlight later. The 3-layer model only asks that you respect the document's grammar — it stays out of the way of writing style.

Phase as scaffolding for "templates at multiple resolutions"

The middle layer (Phase, [Plan] / [Result]) doesn't carry strong necessity today. Distinguishing planned values from actual ones can be expressed with Section headings or Inline tags alone.

I still made Phase its own layer because I wanted to pull a process out at multiple resolutions. Step headings alone give you a skeleton template — just the shape of the procedure. Layer Plan on top, and you get a richer template that also fixes the planned values. Fill in Result, and the note becomes a complete execution record. The same note should read at three resolutions: skeleton, skeleton + plan, and the full thing. A replication run reuses Step + Plan; a control experiment edits just part of the Plan. That's the operating model I'm betting on.

Implementation-wise, I don't use PROV-DM's prov:Plan type for this. The PROV-DM Plan refers to the whole recipe a Plan-aware Agent uses to perform an Activity — tagging individual planned values with prov:Plan would be a misuse. So Graphium carries Phase as a custom-namespace attribute (graphium:phase), splits node IDs between plan and execution variants, and connects them with prov:wasDerivedFrom. PROV-DM explicitly allows this kind of custom-attribute extension, so the result still sits inside PROV-DM compliance — I'm layering my own resolution on top, not bending the schema.

The templating workflow itself isn't implemented yet. Phase is the kind of structural decision you can't easily retrofit later, so I drove the stake in early — before the payoff actually arrives.

Phase is also optional. Skip the Phase headings and everything is treated as execution internally — you can write notes without ever noticing the layer exists. Phase is there for people who want to pull the same note out at multiple resolutions, not something forced on every user.

Why both blocks and inline

A reasonable question: if inline carries the Entity, is block structure still needed?

Yes. After implementing this, my settled answer is that blocks and inline highlights operate at different granularities, and you need both.

Blocks — the unit of editing. The granularity AI uses when you say "rewrite this paragraph."
Inline — the in-text referent identified as a PROV Entity. Phrase-level granularity.

You need both. Blocks-only is too coarse to pick up in-text terms; inline-only is too fine to be a useful edit unit. Inline highlights actually keep their entityId separate from the text range, so identity stays stable even when the range moves under editing (a design point worth its own post). But "from here to here is one unit of work" still needs the coarser frame that blocks provide.

The three-layer structure is a deliberate fix for this gap. Section/Phase says "this region is one act"; Inline says "and these specific things were involved."

Writing order is no longer constrained

A side effect: order doesn't matter anymore.

In the old model, block-level labels nudged you toward "drop the Input block first, then a Step block." The 3-layer model lets you write the whole experiment as prose first, and add headings + highlights afterward. Or start from a heading template and fill in. Either works.

Decoupling writing order from PROV structure moves notes back from form-shaped to essay-shaped.

Where it currently sits

The [Plan] vs [Result] distinction isn't for everyone. Tracking planned vs actual is second nature in some research cultures and feels like overhead in others. Keeping Phase as an optional layer is the place I've landed on for now.

Same with how inline highlights get applied — selection + toolbar, slash command, AI auto-tagging — "AI roughly tags during a draft pass, human refines" is the workflow I've settled into, and I don't feel a strong need to push past it yet.

The direction — write prose as prose, and still pull machine-readable provenance out of it — fits the larger theme I want to keep chasing, and the 3-layer structure feels like a settled answer for it, at least for now.

GitHub: https://github.com/kumagallium/Graphium