DEV Community: Masaya Kumagai

Write notes the way you always do — structure comes out afterwards

Masaya Kumagai — Tue, 05 May 2026 13:44:56 +0000

If you're going to record what you tried, what you really want to write is something close to a flowchart: what you used, how you tried it, what came out. Keep that machine-readable and you can replay or compare runs later. The value is obvious.

But writing a flowchart for every attempt is heavy work. While you're actually trying things, you'd rather scribble a sentence. Pure prose, on the other hand, is hard to pull machine-readable structure out of afterwards. Can I write loose prose and still get structured provenance out of it? — that's roughly the question I had in mind for Graphium.

Take a sentence like "Dissolved 5 g NaCl in 80 °C water, obtained a clear solution." I'd like to write that as prose, and still extract afterwards: what was the material, what was the condition, what was the output.

Older Graphium was "one label per block." Input block: NaCl. Parameter block: 80 °C. Output block: clear solution. Provenance was clean — but from the writer's side, a single sentence had to be broken apart into four entry fields. It felt less like an experiment note and more like filling out a structured form.

Two perspectives meeting in the same note

What's actually happening here is the question of whether two different demands can coexist in the same note.

One demand is to leave behind a graph-shaped record. If the data survives in a form you can search, aggregate, and compare later, replay and review become much easier. From a data-science perspective, this is the starting point.

The other demand is to capture what's happening in front of you, without losing it. Following the flow of thought and hand, write it down naturally as prose. The traditions of recording trial and error — lab notebooks, recipe notes, sketchbooks — all lean this way, valuing a kind of in-the-moment improvisation.

To satisfy both demands in the same place, you need something to bridge them. I put that bridge in the grammar of the document. Writing already comes with grammar — headings, paragraphs, nouns, verbs — and the distinction between act and object is naturally embedded in it. Place labels along that grammar, and the writer is just writing prose, while the reader gets a graph for free. That bridge — that's what convinced me to rebuild around grammar.

In v0.5.0 I restructured along this principle into three layers: Section (headings) / Phase (plan vs result) / Inline (in-text highlights).

Mapping document grammar onto PROV-DM

The idea is small: pin PROV-DM's ontology onto the grammar of writing.

PROV-DM	Grammar	Graphium
Activity (verb / clause)	Heading hierarchy	Section — h1/h2/h3
※ PROV-DM extension (`graphium:phase`)	Sub-heading	Phase — `[Plan]` / `[Result]`
Entity / Attribute (noun / phrase)	In-text term	Inline — `[Input]` / `[Tool]` / `[Parameter]` / `[Output]`

The middle row is the odd one out. Phase is not a standard PROV-DM concept — Graphium adds it as a custom attribute (graphium:phase). The intent behind it comes later in this post.

Verbs become headings, nouns become inline highlights. Hold to that mapping and you can write prose as prose, and provenance falls out for free.

That earlier sentence, under the heading "Dissolving NaCl":

[Input]NaCl[/] [Parameter]5 g[/] was dissolved in [Parameter]80 °C[/] [Input]water[/] to give a [Output]clear solution[/].

Almost identical to the original — a few small spans of color added. Underneath, Section creates an Activity, Inline creates Entities (NaCl, water, clear solution) and Attributes (5 g, 80 °C), and prov:used / prov:wasGeneratedBy are wired automatically.

Inline highlights also work inside bullet lists, not just prose. You can dump conditions as a quick list and add highlights afterward, or write everything as flowing prose and highlight later. The 3-layer model only asks that you respect the document's grammar — it stays out of the way of writing style.

Phase as scaffolding for "templates at multiple resolutions"

The middle layer (Phase, [Plan] / [Result]) doesn't carry strong necessity today. Distinguishing planned values from actual ones can be expressed with Section headings or Inline tags alone.

I still made Phase its own layer because I wanted to pull a process out at multiple resolutions. Step headings alone give you a skeleton template — just the shape of the procedure. Layer Plan on top, and you get a richer template that also fixes the planned values. Fill in Result, and the note becomes a complete execution record. The same note should read at three resolutions: skeleton, skeleton + plan, and the full thing. A replication run reuses Step + Plan; a control experiment edits just part of the Plan. That's the operating model I'm betting on.

Implementation-wise, I don't use PROV-DM's prov:Plan type for this. The PROV-DM Plan refers to the whole recipe a Plan-aware Agent uses to perform an Activity — tagging individual planned values with prov:Plan would be a misuse. So Graphium carries Phase as a custom-namespace attribute (graphium:phase), splits node IDs between plan and execution variants, and connects them with prov:wasDerivedFrom. PROV-DM explicitly allows this kind of custom-attribute extension, so the result still sits inside PROV-DM compliance — I'm layering my own resolution on top, not bending the schema.

The templating workflow itself isn't implemented yet. Phase is the kind of structural decision you can't easily retrofit later, so I drove the stake in early — before the payoff actually arrives.

Phase is also optional. Skip the Phase headings and everything is treated as execution internally — you can write notes without ever noticing the layer exists. Phase is there for people who want to pull the same note out at multiple resolutions, not something forced on every user.

Why both blocks and inline

A reasonable question: if inline carries the Entity, is block structure still needed?

Yes. After implementing this, my settled answer is that blocks and inline highlights operate at different granularities, and you need both.

Blocks — the unit of editing. The granularity AI uses when you say "rewrite this paragraph."
Inline — the in-text referent identified as a PROV Entity. Phrase-level granularity.

You need both. Blocks-only is too coarse to pick up in-text terms; inline-only is too fine to be a useful edit unit. Inline highlights actually keep their entityId separate from the text range, so identity stays stable even when the range moves under editing (a design point worth its own post). But "from here to here is one unit of work" still needs the coarser frame that blocks provide.

The three-layer structure is a deliberate fix for this gap. Section/Phase says "this region is one act"; Inline says "and these specific things were involved."

Writing order is no longer constrained

A side effect: order doesn't matter anymore.

In the old model, block-level labels nudged you toward "drop the Input block first, then a Step block." The 3-layer model lets you write the whole experiment as prose first, and add headings + highlights afterward. Or start from a heading template and fill in. Either works.

Decoupling writing order from PROV structure moves notes back from form-shaped to essay-shaped.

Where it currently sits

The [Plan] vs [Result] distinction isn't for everyone. Tracking planned vs actual is second nature in some research cultures and feels like overhead in others. Keeping Phase as an optional layer is the place I've landed on for now.

Same with how inline highlights get applied — selection + toolbar, slash command, AI auto-tagging — "AI roughly tags during a draft pass, human refines" is the workflow I've settled into, and I don't feel a strong need to push past it yet.

The direction — write prose as prose, and still pull machine-readable provenance out of it — fits the larger theme I want to keep chasing, and the 3-layer structure feels like a settled answer for it, at least for now.

GitHub: https://github.com/kumagallium/Graphium

Why we built provenance into a notes app

Masaya Kumagai — Mon, 04 May 2026 15:20:15 +0000

Notes written while you're in the middle of trying something out are good at recording results and reflections, but keeping the flow that led to those results as structure is surprisingly hard. The procedure that lived only in someone's head at the time, the implicit assumptions that didn't make it onto the page, the judgment calls that got summarized away in meeting decks — when you read the notes back years later, those rarely survive together.

I felt this myself. More than ten years after stepping away from active experimental work, I tried to recall the flow of one of those experiments. What I had left were fragmented notes and a few meeting decks. The results were there, but the flow that led to them had thinned out over time.

There is advice that says: write the procedure as a flow chart in your notebook. Even so, notes written in the moment tend to be results-centric, and keeping the flow recorded as structure on top of that takes more effort than expected. For reproducing or revisiting work later, having a setup that records the flow as structured data alongside the results would be valuable.

Japanese cooking has a saying about "sa-shi-su-se-so" — adding sugar, salt, vinegar, soy sauce, and miso in that order changes the taste. The same is true anywhere you're trying things — experiments, recipes, code, drafts. What you add when, in what order, how long you wait — those choices shape the outcome. The flow is making the result.

So I wanted a way to keep "what came from what, through what flow" — that causal data — as a structured record. There is also another reason: this kind of data feels especially interesting in an era when AI starts handling the data of trial-and-error work itself.

PROV-DM as the missing model

That is when I came across PROV-DM (Provenance Data Model), a W3C standard for describing what was made, from what, and how. It defines three primitives — Entity, Activity, Agent — and the relations between them.

Academic data systems use it. Personal notes apps don't, as far as I can tell. But the daily output of anyone trying things on the ground already fits this shape. A researcher writing "I heat-treated Sample A and got Sample B" and a baker writing "I added sugar before salt and the texture changed" both describe, in PROV-DM terms, "Entity B was generated by an Activity from Entity A." With this, I had a way to keep the flow of trial and error — the part that used to live only in my head — as structured data.

Another layer of provenance — same model for edit history

Looking deeper into PROV-DM, I realized the same data model also fits document edit history, not only content provenance. In fact, that may be closer to what PROV-DM was originally designed for.

So in Graphium, I track provenance in two layers:

Layer 1: Content provenance — the experimental workflow (Sample A flowing into Sample B, and so on)
Layer 2: Document edit provenance — who edited what, and when

Layer 2 maps the editor (human or AI) to prov:Agent, edit operations to prov:Activity, and document revisions to prov:Entity. The fact that both layers describe themselves in the same PROV-DM vocabulary felt like the right design choice to me.

Both layers gain weight in the AI era

I think both layers grow in value as AI becomes part of the picture.

Layer 1 keeps the procedure inside a note as causal structure. When inputs, steps, and outputs are connected as a causal flow within the same note, you — or an AI reading that note later — can trace the flow at a higher resolution. As AI starts handling this kind of trial-and-error data, structured procedures themselves gain value as material that can be analyzed or reused.

Layer 2 matters when AI-written and human-written content start mixing in the same notes. As AI writes parts, summarizes, or edits, being able to tell later "this was my own observation, this was added by AI" matters when you re-read your own work or share it with others. Just having "AI or not" recorded as Agent gives that distinction a place to live.

The same "what came from what" question applies far beyond lab work. Recipes, software change histories, medical records — the shape is the same. That's why the header image of this post is a bread-making note rather than a chemistry experiment.

Recording it without making the user edit a graph

Asking users to author a graph directly is a non-starter. So Graphium maps PROV-DM onto the grammar of the document itself: headings become Activities, and short inline highlights inside headings turn the named term ("NaCl", "80°C", "clear solution") into Entities.

The writing experience stays "type a heading, write a paragraph, occasionally highlight a word." The provenance graph is a computed view — never something you edit by hand.

Coexisting with knowledge links

Not every link should be causal. "This paper was interesting" or "this concept resembles that one" are non-directional. Forcing causality on them is unnatural.

So Graphium splits them. @ mentions default to knowledge links (no direction, cycles allowed). Relations between inline highlights inside heading scopes are provenance links (directed, acyclic). Same act of writing, two different graphs underneath.

GitHub: https://github.com/kumagallium/Graphium

I Want to Build a Note App Where Discoveries Happen

Masaya Kumagai — Thu, 23 Apr 2026 00:37:01 +0000

like the moment when something clicks.

When you've been tweaking a recipe and finally nail the flavor. When you're debugging and suddenly see the root cause. When you're reading a paper and a concept from a completely different field connects. The domain changes, but that "this is it" feeling is the same.

I wanted to build a note app that supports the trial and error behind those moments. That's why I started building Graphium.

The limits of lab notebooks

When I was doing materials science research, I kept running into the same problem: experimental processes were hard to record and even harder to pass on. In grad school, different senior researchers taught me different preprocessing procedures for the same sample material. The know-how lived in people's heads, not in any shared system. We wrote things down in paper lab notebooks, but describing a process precisely is tedious, and searching through those notes later is nearly impossible. On top of that, the data that makes it into publications is just the tip of the iceberg — the trial-and-error data sitting in each lab usually goes unused and eventually disappears.

In 2019, I had a vague idea: "If we could record experimental processes in a structured way, maybe AI could make use of them." (I wrote about this problem in a blog post and a FIT2020 presentation at the time — both in Japanese.) That was the starting point for Graphium. But I had no concrete vision of what it would look like.

Pieces that didn't fit yet

Over the next few years, the pieces I needed started coming together in my hands, one by one.

In 2023, around the time GPT-4 came out, I discovered Niklas Luhmann's Zettelkasten — a method where you write individual notes in your own words, link them to each other, and let a network of thought grow over time. I was drawn to the idea that connecting notes could lead to new insights. At the same time, I felt the weight of "permanent notes" — the process of abstracting and restructuring raw notes into refined knowledge. "Could AI handle this part?" I thought. But I couldn't see how this connected to my 2019 problem yet.

In 2024, I came across BlockNote.js, a block-based editor framework. Each element — text, images, data — exists as an independent block with its own ID. I sensed potential in its extensibility. At that point, it felt like "an interesting piece of technology" to me personally, but I didn't yet have a concrete sense of how it would tie into my 2019 problem.

In 2025, I learned about PROV-DM, the W3C standard for describing provenance. It models relationships between Entities (things), Activities (actions), and Agents (actors) in a structured way. That year, a colleague and I wrote a paper called MatPROV, applying PROV-DM to structure the provenance of materials synthesis. It was accepted at a NeurIPS 2025 workshop. For the first time, the vague 2019 idea — "structured recording of experimental processes" — had a formal framework to attach to.

The pieces were accumulating. But I still couldn't see how to put them together into one thing.

The moment everything connected

In January 2026, a colleague suggested: "What if you could attach context labels to blocks?" A simple idea — give semantic meaning to individual blocks via labels. But the moment I heard it, the scattered pieces clicked into a single picture. Blocks have IDs. IDs mean you can attach labels. Labels mean you can auto-generate a PROV-DM provenance graph. Links between blocks become a Zettelkasten network. And if AI layers a knowledge base on top of that network, you can break through the permanent-note bottleneck.

It also helped that this was a time when AI could vibe-code. Once the ideas connected, I could immediately start building. I went from incubating a concept to implementing it almost overnight.

Then in April 2026, Andrej Karpathy proposed a design pattern called LLM Wiki — an approach where LLMs continuously build and update a Markdown wiki. As he put it: "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." The idea I'd been carrying since 2023 — "let AI handle Zettelkasten's permanent notes" — suddenly had a concrete implementation pattern.

Three layers of Graphium

These pieces combined into a structure where Graphium supports discovery through three layers.

Organize your thinking. Type @ to reference another note, and a network between your notes starts growing. Zettelkasten's "connect your thoughts through links" philosophy, built directly into the editor.

Accelerate discovery. As you accumulate daily notes, AI reads across them and auto-generates a knowledge layer draft. For example, scattered experimental findings across multiple notes get organized into a single synthesized page, which you can then review and edit. With minimal effort, a knowledge base equivalent to Zettelkasten's permanent notes grows over time.

Protect your discoveries. Attach labels like #Input or #Output to blocks, and a provenance graph is auto-generated, showing what went in, what steps were taken, and what came out. Back in grad school, different senior researchers taught me different preprocessing procedures and I couldn't reliably reproduce any of them — this is exactly the kind of tool I wished I'd had back then. Provenance is described using the W3C standard PROV-DM, and when you quote or derive new notes from existing blocks, those connections are structurally recorded too.

Without any action, it stays a simple note app

A key design principle: these features only appear when you need them.

Without labels, Graphium is a simple note app. Start using @ references and the link network becomes visible. Add # labels and provenance graphs appear. Enable AI and the knowledge layer starts growing. Complexity scales with your actions, step by step.

Structure reveals itself gradually, only to the extent you need it — a progressive design I want to preserve.

Just getting started

The first commit was on March 23, 2026 — just about a month ago. There's plenty left to build, but ideas I'd been carrying in fragments since 2019 are coming together into a single product, and that process itself has been a chain of discoveries.

Right now Graphium is built for individual discovery. But I have a feeling this structure could eventually extend toward formalizing tacit knowledge within teams and growing collective intelligence.

In this series, I'll walk through Graphium's design decisions one by one. Why I built it this way, what I chose not to build, what I'm still figuring out. I want to share the development process as it actually is.

GitHub: https://github.com/kumagallium/Graphium