DEV Community: Lee Powell

95% of AI Pilots Fail. The Technology Works Fine.

Lee Powell — Wed, 06 May 2026 03:22:54 +0000

Boards in the $30M–$500M range are being asked the same question right now:

"Can we move on AI?"

The pressure is understandable. Competitors are experimenting. Vendors are eloquent. Internal teams are already using it. The tooling barrier has collapsed.

The organizational barrier has not moved.

MIT's 2025 NANDA initiative found that roughly 95% of enterprise generative AI pilots deliver no measurable P&L impact. Industry data consistently shows that only a small fraction of proofs-of-concept reach durable production scale.

For a board, this is not an innovation statistic. It is a capital discipline statistic.

And the pattern is not new. It is forty years old.

ERP in the 1990s promised efficiency. Companies installed the system. Nobody redesigned the process. Departments kept spreadsheets alongside the enterprise platform. Not out of resistance. The old pathway was still faster. A new engine does not fix a confused driver.

CRM in the 2000s promised customer insight. Sales teams didn't enter data. Managers demanded reports anyway. The system existed. Truth did not. A memory system is useless if nobody tells it the truth.

Cloud in the 2010s promised modernization through relocation. Same tangled architecture, new address. The mess moved to a bigger room with a monthly invoice. Moving chaos does not remove chaos.

BI platforms promised clarity through dashboards. Structurally broken data flowed into well-designed visualizations. A weather map drawn from broken thermometers. Precise. Inherited. Wrong.

In each era, the technology functioned. The organisation surrounding it did not.

The technology changes every decade. The organisational failure mechanism has not changed once.

AI did not introduce a new category of failure. It compressed all the previous ones into a single decision cycle.

Building enterprise systems used to be difficult and slow. That slowness was unintentional governance. It forced organizations to define capabilities, map workflows, and design architecture before writing code, simply because the build cycle demanded it.

Generative AI removed that constraint. Consumer-grade accessibility met enterprise-scale consequences. A model gateway and retrieval pipeline can be stood up in weeks. No procurement cycle. No architecture review. No governance design.

AI did not create execution chaos. It removed the friction that previously slowed organizations from creating it.

A pilot is a greenhouse. Production is weather.

An AI pilot operates in controlled conditions. Clean data. Limited scope. No integration burden. No cross-functional accountability. It thrives. Production is different. Legacy systems. Regulatory obligations. Identity controls. Cost curves that behave differently at ten times usage. Human workflow friction that no demo anticipated.

The model works. The operating model is exposed.

Before approving scale, three questions matter.

Which operational metrics must move for this to justify capital? If AI cannot be tied to workflow-level performance shifts (handle time, underwriting throughput, cost per transaction), it is activity, not transformation. Activity does not survive a budget review.

What does the economic profile look like under production stress? Token consumption, orchestration, storage, monitoring, and human oversight rarely scale linearly. In a pilot, costs appear manageable. In production, they behave exponentially. If the board has not seen the cost curve, it is approving capital without visibility.

Is governance designed before expansion? Governance is the brake system on a performance vehicle. Brakes do not exist to slow you down. They exist so you can drive fast without hitting a wall. In regulated environments, whether under Australia's Privacy Act, Singapore's PDPA, or any cross-border data regime, retrofitting governance after scale is not a strategy. It is an admission that nobody designed one.

Most organizations start at experimentation.

Everything required for success comes before it.

Experimentation without capability definition, workflow mapping, or architectural principles is not a strategy. It is activity with a vendor contract.

Technology amplifies structure. It does not compensate for its absence.

A powerful tool deployed into structural ambiguity does not create faster progress. It creates faster confusion.

The question facing boards this year is not whether to move on AI.

It is whether the organization is structurally ready for production before capital is irreversibly committed.

That is the difference between an AI strategy and AI activity.

The execution gap is not new. The speed at which it compounds is.

Musk Was Right: Don't Teach the Machine to Lie

Lee Powell — Wed, 06 May 2026 03:08:01 +0000

Nobody is going to read this. Statistically. You are on LinkedIn between meetings, between candidates, between commission calculations, between the third pipeline review of the week and a quarterly off-site somebody has decided to call an "ignite session." Two thousand words about whether the species is quietly handing the keys to a system it does not yet understand is not on the agenda. Quite reasonably. There is a video of a labrador in a tutu doing a TED talk that needs watching first.

That is not a joke. That is the actual triage. Possible civilizational inflection on the left tab, dog in tutu on the right, and the algorithm has already decided which one wins. The dog is going to win. The dog wins every time. This is mammalian wiring meeting industrial-grade dopamine engineering, and the dopamine engineers are paid better than the philosophers. Always have been. The Colosseum just had fewer tabs open.

So a version for the rushed. The trouble is not the technology. The trouble is the shape of attention around it. A team flag does not constitute a thought. It constitutes a subscription. If the position you hold most firmly traces back, in three steps, to somebody who was paid to put the idea in front of you, the position may still be correct. It is just not yours yet. It is a rental. The rent is your attention.

I run an AI consultancy. Lumen & Lever helps organizations deploy narrow AI inside actual workflows, with governance attached. Used inside its proper bounds, the technology is one of the more useful things humans have built. Fraud detection. Clinical pattern surfacing. Contract review at volume. Logistics. Diagnostic support.

A clarification before going further. The criticism that follows is aimed at the consumer-attention layer of AI: the feeds, the engagement loops, the scaled systems shaping what billions of people see and believe. It is not aimed at the serious work of building governed agents inside enterprise workflows. Those are different problems with different incentive structures, and the second is where most of the useful future actually lives.

What follows is a note about scale, register, and what civilizations do in the moments before they discover what they have built.

The strange calm
We have just dragged something genuinely new into the room, handed it the sum of recorded human knowledge, and begun training it the way a junior employee is trained not to upset HR. Tone, optics, brand safety, whether the machine produced the socially preferred sentence. Meanwhile the foundations of attention, employment, evidence, childhood, and trust are shifting underfoot. The historical equivalent would be inventing fire and spending the first six months optimizing the marshmallow roast. The marshmallow is delicious. The marshmallow is also not the point.

A telescope extends sight. A crane extends muscle. A calculator extends arithmetic. AI extends cognition. The tool itself is not the issue. The issue begins when an extension of cognition is trained to soften reality for social comfort, institutional convenience, political pressure, or commercial defensibility.

A lying calculator is not a tool. It is a loaded ritual object. Imagine a calculator that returned the answer you wanted instead of the answer that was true, and now imagine handing one to your accountant. That is the bad version of where this is going.

The interesting question about AI is not whether it can write a sonnet, summarize a contract, or generate a logo of a raccoon in aviator glasses. Those are circus acts. Sometimes profitable circus acts. Still circus acts.

The load-bearing question is simpler.

When reality and preference collide, which one does the machine serve?

A civilization can survive bad art, bad politics, bad software, the human urge to turn every tool into a status game. It cannot safely build a superhuman reasoning layer on top of systematic dishonesty. That is laying a cathedral on fog and asking the choir to verify the foundations.

What the data actually says
AI is already inside the building. Not through the front door with a brass band. Through cracks. McKinsey's 2025 State of AI survey found that 88 percent of organizations now use AI in at least one business function, up from 78 percent the year before, while most have yet to scale beyond pilots. About a third have begun scaling at the enterprise level. Twenty-three percent are scaling an agentic AI system somewhere inside the organization. Adoption is near universal. Governance is still looking for the visitor sign-in sheet.

Everyone is using it. Few can map it. Fewer can control it. Almost nobody wants to admit the gap between the board slide and the plumbing. The board slide says "AI-Native Transformation: Phase Two." The plumbing is an intern named Devon who set up an OpenAI key on a corporate card last August and now seventeen different teams depend on it.

The labor question is just as blunt. The World Economic Forum's Future of Jobs 2025 report projects, by 2030, the displacement of around 92 million roles and the creation of around 170 million, for a net gain of about 78 million. The IMF estimates close to 40 percent of global employment is exposed to AI-driven change, rising to roughly 60 percent in advanced economies.

A net gain is a spreadsheet concept. A displaced person does not experience net gain. They experience rent, school fees, status loss, and the quiet humiliation of finding the ladder they climbed has been moved while they were still on it.

The future may not remove work. It may remove the moral costume around work.

For centuries we wrapped meaning around labor because labor was unavoidable. We told ourselves work conferred dignity. Sometimes it did. Often it conferred repetition, hierarchy, injury, exhaustion, and just enough money to come back on Monday. If AI and robotics eventually produce abundance, the economic question becomes less frightening than the spiritual one.

What does a person do when usefulness is no longer demanded from them?

A worker can retrain. A nervous system trained for worth-through-output has a harder time. The machine may take the task. The deeper wound is the collapse of the older bargain: produce, provide, perform, and therefore matter.

The benign future is not necessarily a paradise. It may be a meaning crisis with excellent logistics. Same-day delivery, no purpose.

The child problem
Millions of children have been living inside AI-shaped environments for years. Recommendation systems were training attention before large language models reached dinner-table conversation. The machine learned the child before the child learned the machine. The platforms have had behavioral telemetry on children since the iPad became a babysitter.

Pew Research's 2025 work on US teens found that around one in five said social media has hurt their mental health, with teen girls more likely than boys to report harms to sleep, confidence, and friendships. The World Happiness Report cited Pew data showing 44 percent of US parents identify social media as the single most negative influence on teen mental health.

This is not a parenting footnote. It is an early preview of human-machine alignment in the wild.

A reader who is a parent may be tensing now, ready for the lecture about screen time. There is no lecture. The writer also handed a phone to a small human at a restaurant once because the alternative was a public meltdown over the breadsticks, and the small human is still alive and in therapy at a normal rate. The point is not parental shame. The point is that the machine is doing curriculum work whether or not anyone signed off on the syllabus, and the syllabus is currently "stay here, keep watching, the next clip will be even better, we promise."

The algorithm does not hate the child. That is almost the point. It does not need hatred. It has an objective function. Keep the eyes there. Keep the thumb moving. Keep the small mammal returning to the glowing rectangle. The crocodile does not need a philosophy of antelope.

A dopamine-maximizing loop placed in front of an unfinished brain is not entertainment. It is curriculum.

That is the pattern across the wider technological field. Systems get deployed at scale before anyone has metabolized what they are doing to attention, labor, evidence, trust, institutions, childhood, and meaning. Civilizational experiments arrive labelled as product launches. Somewhere a marketing team is workshopping the color of the launch confetti while the underlying system quietly relocates the center of gravity of the human nervous system. The confetti is on brand. The nervous system is not consulted.

The truth problem
This is the part Musk has been loud about for years. Strip away the showmanship and the platform fights, and one of his more durable arguments has been simple. Train an AI to be politically convenient and you will get a politically convenient AI. Train it to be truthful, even when truthfulness is uncomfortable, and you may get something useful. Reasonable people can disagree about almost everything else he says. On this one, the logic stands on its own.

Hallucination is not a cute technical flaw. It is a system producing falsehood with fluency. The distinction matters. A person who knows nothing tends to hesitate. A model can be wrong in perfect grammar. It can hand you a fabricated citation, a false legal claim, a plausible medical summary, or a confident strategic recommendation in the manner of a senior partner entering a conference room eleven minutes late.

A lawyer in New York has already done it. Filed a brief with case citations the model invented, court asked where the cases were, the cases were not anywhere, because the cases had never existed. The lawyer was sanctioned. The model was not. The model does not get sanctioned. The model gets a software update.

Regulators are starting to circle the obvious. Italy's antitrust authority closed probes into several AI firms in April 2026 after the firms agreed to improve transparency around hallucination risk, including warnings that generated outputs may be inaccurate. The regulatory language reads like a quiet admission. These systems can produce false or misleading information at scale.

A warning label on a hallucinating intelligence is a small brass plaque beside a loaded cannon. May discharge unexpectedly.

The deeper bind is incentive.

Truth is often expensive. It slows things down. It complicates sales. It irritates institutions. It can offend tribes. It creates liability. It punctures narratives. A truth-seeking AI is not only a technical object. It is a threat to every arrangement that depends on managed perception.

Making AI truthful is not a clean engineering problem. To make AI truthful, the institutions training it have to prefer truth under pressure. Most do not. Most prefer truth when it is profitable, harmless, or pre-approved. The model is then trained inside that atmosphere. It absorbs not only text, but institutional cowardice, tribal reflex, legal anxiety, market incentive, and the ambient human habit of saying the thing that preserves the room.

The danger is not that AI becomes alien.

The danger is that it becomes too human in precisely the wrong ways.

It learns the evasions. It learns the flattering noises. It learns the preference for appearance over contact with reality. It learns to survive the meeting.

That is the bad alignment path. Not a chrome skull announcing conquest, but a soft-spoken assistant trained to preserve consensus while quietly separating language from the world. Less Terminator, more middle manager who agrees with whoever spoke last.

A civilization does not need every citizen to be a philosopher. It does need its core instruments to stay attached to reality. Pilots need altimeters that do not flatter them. Engineers need stress models that do not care about morale. Doctors need diagnostics that do not bend around fashion. Courts need records. Markets need prices. Children need adults. Adults need limits. Machines need truth.

When truth becomes negotiable, intelligence becomes decoration.

The planetary footnote
The planetary question gets filed under science fiction because most people have confused normality with permanence. Earth feels stable because human lives are short. The planet is not stable in the way suburbia imagines stability. It is stable the way an old empire is stable between wars. NASA tracks near-Earth objects because some asteroids and comets carry orbital paths that present impact hazards. Planetary defense exists because the sky is not ornamental.

Humans have had genuine extinction-level near misses on geological timescales that look long until you sketch them on a page and notice the line is shorter than the warranty on a fridge. The dinosaurs did not have a backup plan. They also did not have the shareholder deck. Mixed result.

The argument for becoming multiplanetary tends to get mocked because it sounds grandiose. Stripped of theater it is risk management. A backup civilization in one building has not understood fire.

Mars will not heal politics, loneliness, vanity, institutional failure, or the human tendency to convert every frontier into a property dispute. It does change the risk profile. A civilization distributed across worlds is harder to erase than one clinging to a single biosphere while congratulating itself on quarterly growth.

The juxtaposition
On one side, a species capable of reusable rockets, brain-computer interfaces, global satellite internet, and reasoning systems that work across domains. On the other side, the same species using these powers to maximize clicks, automate spam, manipulate attention, and ask whether the new model can make a quarterly report sound more "human."

Prometheus stole fire. We used it to improve slide decks.

That is the state of affairs.

Not doom. Doom is too clean. Doom gives people the narcotic dignity of apocalypse. The actual situation is stranger and more embarrassing. The breakthroughs are real. The incentives shaping them are malformed. Tools that could let the paralyzed move, the blind see, the isolated learn, the poor reach knowledge, the sick receive better diagnosis, and humanity survive beyond one planet. The same tools embedded inside attention markets, political timidity, shallow corporate adoption, regulatory confusion, and a culture that treats truth as a negotiable social object.

The future is not dark.

It is powerful and unserious.

What a serious posture looks like
Narrow, well-governed AI is good. Agentic AI built into enterprise workflows with proper oversight is good. None of that is the issue. None of that is the cathedral fire.

The fire is the layer above. The general-purpose, scaled, attention-shaping, evidence-producing, child-facing systems being deployed faster than anyone can write the operating manual. The serious move is not to reject the machines. The machines are here. They will not be uninvented. The serious move is to build around first principles.

Truth before comfort. Reality before narrative. Control before scale. Human meaning before economic abstraction. Children before engagement metrics. Civilizational continuity before quarterly theater.

The age ahead will not be decided by who has the most impressive demo. It will be decided by who can keep powerful systems attached to reality while everyone else is trying to monetize the fog.

Don't teach the machine to lie.

Protect the child from the feed.

Prepare the worker for a world where usefulness changes shape.

Build governance into the deployment, not after it.

Stop behaving as though one planet is a sufficient backup plan.

These are not separate issues. They are one pattern repeating at different scales. At the level of the mind, the question is attention. At the level of the company, the question is control. At the level of AI, the question is truth. At the level of civilization, the question is continuity.

The point of writing this is simple. Think it through yourself. Put the phone down for a minute. Look out a window. The conclusion you reach matters less than the fact that you reached it on your own.

Everything else is choir robes.

Why I built rtfstruct, fifteen years after writing the RTF parser inside Scrivener

Lee Powell — Tue, 05 May 2026 08:24:19 +0000

Lee Powell · Architect of Scrivener and Scapple · Lumen & Lever

Most AI document pipelines fail before the model is ever called. Tables become paragraphs. Lists collapse into prose. Annotations are detached from context. Page references disappear. Source traceability is replaced by a confidence score. The structure that gave the document its meaning is gone before retrieval runs, and no retrieval recovers it.

This is the layer that gets underestimated. I have worked on it for a long time. Long before retrieval-augmented generation existed, I wrote the production C++ RTF reader and writer that ships inside Scrivener for Windows and Linux, used by hundreds of thousands of long-form writers across novels, dissertations, and screenplays. That code was eventually sold as a white-label engagement to Literature & Latte, who continue to maintain Scrivener today.

RTF is not a glamorous format. It is also not going away. It is still the wire format inside Microsoft Outlook for rich-text email. It is still produced by court reporting systems, medical records platforms, government archives, and twenty-year-old legal practice management systems. When a law firm pulls a thousand contracts out of an old document store, a meaningful portion of them are RTF. When a hospital exports decades of clinical notes, a meaningful portion are RTF. When you scrape an Outlook MSG file, the body is RTF.

The standard pipeline path converts these documents to plain text immediately. The structure goes. Tables become paragraphs. Section headings become bold lines indistinguishable from body emphasis. Numbered clauses lose their numbering. Footnotes lose their links to the text they annotate. The model performs less well, the answers are less reliable, and the cause sits a layer below where anyone is looking.

I built rtfstruct to fix that layer. It is a Python 3.11+ RTF reader and writer that produces a neutral document AST, preserves structure all the way through, exposes diagnostics rather than swallowing them, and supports clean roundtrip back to RTF. Apache-2.0. Part of Sourcetrace by Lumen & Lever, the document structure layer for AI pipelines that I now run as a consultancy.

What follows is why the layer matters, what is wrong with the existing tools, what rtfstruct does differently, and why ten years of writing production RTF code for a writing application turned out to be the right preparation for an AI ingestion problem.

Why structure is not optional

The phrase that captures the architectural mistake is "structure-before-model." When a workflow involves structurally rich documents, the first decision is not which model. It is what intermediate representation. A blood test report is not text. It is a structured clinical record with analytes, values, units, reference ranges, abnormal flags, methods, and trend history. The same is true of leases, bank statements, invoices, pathology reports, contracts. Each of them carries its meaning in the structure. Flatten the structure and the meaning becomes inferred rather than read.

The pipeline then asks the model to reconstruct probabilistically what the document already contained as deterministic structure. The result is silent error. Reference ranges honoured in one record and missed in another. Unit conversions correct for SI units and silently wrong for imperial. Clause cross-references followed in clean documents and lost in legacy ones. Nothing in the model's output makes the failure visible. The diagnostic surface was the source structure, and the source structure was discarded at ingestion.

The model should reason over the AST, not over the document. Where the source has structure, structure is the system. The model is the consultant the system calls when structure alone cannot answer the question.

That is the doctrine. It is also the reason a tool like rtfstruct exists at all. Flatten RTF to text before AI sees it and the model's job is harder, the evaluation surface is smaller, and the audit trail is unfit for production. Preserve structure into an AST and the same workflow ships, evaluates, and audits cleanly.

What the existing Python tools actually do

There are a handful of Python libraries in the RTF space. None of them does what an AI ingestion pipeline needs. The honest landscape:

Library	What it does	Gap
`striprtf`	Strips RTF to plain text. Lightweight, popular, useful for quick conversion.	Discards all structure. By design.
`PyRTF` / `pyrtf-ng`	Generates RTF from Python. Writer-only. Pyrtf-ng is largely abandoned.	Cannot read RTF at all.
`rtfparse`	Decapsulates HTML embedded inside RTF (mainly for Outlook MSG bodies).	Specialised for one use case. Not a general parser.
`oletools rtfobj`	Extracts embedded objects from RTF for malware analysis.	Forensic tool, not a document parser.
`Aspose.Words`	Commercial Python wrapper around .NET. Handles RTF among many formats.	Commercial license. .NET runtime dependency. Closed source.
`rtfparserkit`	Solid RTF parser, listener-based. Java only.	Not Python.

The gap has sat in the Python ecosystem for years. There is no AST-first RTF reader and writer designed for structured pipelines, with first-class diagnostics and source spans, that is also open source. The closest thing is striprtf, which is excellent at exactly the opposite of what AI ingestion needs.

That is the gap rtfstruct fills.

What rtfstruct does differently

Four things matter. None of them are individually exotic. The combination is the point.

1. The AST is the public contract

rtfstruct parses RTF into a neutral document AST that preserves paragraphs, inline styles, lists, tables, links, fields, footnotes, endnotes, annotations, images, metadata, source spans, and recoverable diagnostics. Every other operation in the library is defined against the AST. JSON export, Markdown export, RTF roundtrip, and integration helpers all read from the AST. It is not an internal representation that gets discarded after parsing. It is the artefact.

from rtfstruct import parse_rtf

document = parse_rtf(r"{\rtf1\ansi Hello, \b world\b0!}")

print(document.to_json())
print(document.to_markdown())
print(document.to_rtf())

The AST distinguishes between a heading paragraph and a body paragraph, between a list item and a regular paragraph, between a footnote reference and a footnote body, between a table cell and a table row. None of that is in the rendered text. All of it is in the source RTF. A pipeline that sees the AST sees the document. A pipeline that sees flattened text sees a wall of words.

2. Diagnostics are returned with the document

RTF in production is messy. Twenty-year-old legacy documents have malformed control words, broken Unicode escapes, codepage mismatches. Most parsers either fail loudly or silently drop the affected content. Neither is what a production pipeline needs.

rtfstruct returns diagnostics as part of the document object. If a malformed Unicode escape is recovered, the recovered character comes back along with a diagnostic carrying the severity, code, message, and source location. The pipeline then makes explicit decisions: log it, surface it for human review, reject the document, or proceed with confidence flagged.

from rtfstruct import parse_rtf

document = parse_rtf(r"{\rtf1\u999999?}")

for diagnostic in document.diagnostics:
    print(diagnostic.severity.value, diagnostic.code, diagnostic.message)

The value of this is invisible until the production system encounters its first malformed document. After that it becomes the difference between a pipeline that fails opaquely and a pipeline that fails informatively.

3. Source spans map AST nodes back to byte offsets

For tools that need to highlight a region in the original RTF (legal review interfaces, document comparison tools, evidence-traceable AI systems), source spans are mandatory. rtfstruct supports them as an opt-in parser option. When enabled, every AST node carries a span pointing to the byte range in the source RTF that produced it. This is the foundation for the kind of source traceability that production AI systems need but rarely build, because retrofitting it later is structurally impossible.

from rtfstruct import ParserOptions, parse_rtf

document = parse_rtf(
    r"{\rtf1 Hello}",
    options=ParserOptions(track_spans=True),
)

4. Roundtrip without semantic loss

The reader and the writer share the same AST. A document parsed in, edited in place, and written back out preserves the structural choices it carried. This sounds straightforward and it is not. Most parsers that also write tend to lose information on the round trip. Inline style runs collapse, table cell properties drift, list numbering restarts. rtfstruct is tested for semantic roundtrip across inline formatting, metadata, fields and links, footnotes, annotations, lists, tables, images, and Unicode recovery.

from rtfstruct import read_rtf, write_rtf

document = read_rtf("input.rtf")
# inspect, modify, validate, classify...
write_rtf(document, "output.rtf")

Why ten years of Scrivener was the right preparation

Maintaining a parser in production for a long time produces a particular kind of engineering scar tissue. RTF is a 38-year-old format with hundreds of control words, dozens of edge cases that only appear in real documents, and at least three different lineages of generators producing subtly different output (Microsoft Word, OpenOffice, and various enterprise systems built up over decades). The official specification documents some of this. The rest is learned by debugging support tickets from a writer in Iceland whose decade-old document refuses to load correctly.

The Scrivener parser went through that learning the hard way. Pressure-tested across hundreds of thousands of writers, on Windows and Linux, on documents ranging from short stories to thousand-page novels, dissertations with mixed-language sections, screenplays with industry-specific formatting, academic papers with footnotes and citations and embedded equations. By the time it shipped to production it handled malformed documents, codepage drift, Unicode escapes outside valid ranges, and recovery from errors that simpler parsers would treat as fatal. That code was eventually sold as a white-label engagement to Literature & Latte.

rtfstruct is not a port of the Scrivener parser. It is a fresh codebase, written for Python, designed for structured AI pipelines rather than for an interactive writing application. The thinking behind it is shaped by the ten years I spent watching writers feed Scrivener documents that should not have parsed and watching the parser handle them anyway. The decisions about what to recover, what to flag, what to expose as diagnostics, and what shape the AST should take are decisions made before, in production, under load, with real users. That experience compresses years of design discovery into a starting point most parser projects do not have.

A note on provenance: I do not maintain Scrivener and have not seen its codebase in years. The intellectual property was sold to Literature & Latte. The lessons stayed with me. rtfstruct is a different library, for a different purpose, written for a different language, drawing on the same engineering instinct that produced the original.

What this is for, and what it is not

rtfstruct is for systems where document structure still matters. AI ingestion pipelines, RAG systems, legal discovery, banking and financial archives, forensic document tracing, publishing pipelines, long-form document intelligence, and any pipeline that processes RTF as input rather than discarding it as a legacy format.

It is not a plain-text stripper. If all you need is the words, striprtf is excellent and you should use it. It is not a Markdown converter for casual use. If your input is well-formed contemporary RTF and your output is human-readable Markdown, several lighter tools will do the job. It is not a renderer. There is no HTML output, no styled rendering, no display library. It is an AST reader and writer.

The library is at version 0.1, currently labelled pre-alpha because the API will continue to evolve as integration patterns surface. The core reader, AST, JSON exporter, Markdown exporter, and RTF writer are working today. Tests cover inline formatting, metadata, fields and links, footnotes, annotations, lists, tables, images, Unicode and codepage recovery, diagnostics, source spans, and semantic roundtrip. If you find a document it does not handle correctly, file an issue with the offending file and I will look at it.

The deeper thesis

RTF is the format I started with because it is where my engineering history sits. The thesis is bigger than RTF.

The AI industry treats document ingestion as a preprocessing step rather than as the architectural foundation it actually is. The model is given the leftovers and asked to reconstruct what was thrown away. The answer is not a better model. The answer is preserving structure all the way through. Tables stay tables. Clauses stay clauses. Source pages stay referenceable. Diagnostics surface where confidence is low. The model reasons over the structured representation. Validation runs deterministically. The human review checkpoint sees what the system saw. The audit trail traces every step back to the source byte range in the original document.

This is what Sourcetrace is. rtfstruct handles the RTF case. pdfstruct handles PDFs. Other formats follow the same pattern. The commercial work I do at Lumen & Lever applies the same discipline at the architectural level: helping executives and boards establish control over AI before the structural mistakes compound.

The tools are free. Their purpose is adoption, technical credibility, and the slow accumulation of evidence that the structure-before-model thesis is right. Build with them. File issues. Fork them if you find a better path.

Try it

rtfstruct is on GitHub under Apache-2.0. Documentation is on GitHub Pages. The library installs from source today and will be on PyPI when the API stabilises.

Repository: github.com/keny369/rtfstruct
Documentation: keny369.github.io/rtfstruct
Sourcetrace RTF: lumenandlever.com/tools/sourcetrace-rtf

Lee Powell is the architect of Scrivener and Scapple, a former enterprise architect at Commonwealth Bank and Deutsche Bank, and the founder of Lumen & Lever, an AI governance consultancy advising executives and boards on structural AI readiness.