DEV Community: Kat Laszlo

A single HTML file for architecture docs

Kat Laszlo — Mon, 27 Apr 2026 17:42:56 +0000

I maintain a single HTML file as an architecture board for the product I'm building. Definitions, data models, API surface, routes, changelogs, postmortems. One file, opens in a browser, shows the state of things at a glance.

I've been using it to stay oriented while developing Tanso, and it's been useful enough that I wanted to share it in case anyone else has been thinking along the same lines.

The origin was simple: I kept asking AI tools to generate structured documents (ERDs, changelog tables, postmortem writeups) and discarding them after a single look. But the output was already organized. The data was already there. The only waste was starting from zero every time.

So I stopped starting from zero and made a template: human-docs.

What's in the repo

template.html   Empty scaffold with section markers. Fork and fill.
PROMPT.md       Instructions for any AI tool to generate or update the doc.
example.html    Cal.com's architecture, fully filled in.

How it works

Point any AI tool (Claude Code, Cursor, Codex) at your codebase with PROMPT.md as the prompt. It fills in the template. Open the result in a browser.

When something changes, you don't rebuild the whole file. The template uses  markers so updates are surgical. The AI edits only the sections that are stale, leaves everything else intact. Similar to how you'd review a git diff: only what actually changed.

Over time the file compounds. Each deploy adds a changelog entry. Each bug adds a postmortem. Each migration updates the ERD. The document gets more useful the longer you maintain it.

The sections

Section	Purpose
Definitions	Clarify overloaded or ambiguous terms in the codebase
Data Model	Tables, relationships, ERDs grouped by domain
Pages & Routes	Every user-facing route and what it does
API Surface	Endpoints grouped by domain
Changelog	What shipped, filterable by type
Postmortems	What broke, root cause, fix, prevention rule

Postmortems are the most valuable section. Changelogs record what happened. Postmortems record what not to do again.

Why HTML

HTML is self-rendering. Open it in any browser, no preprocessing needed. And because it's structured markup, an AI agent can parse it and update individual sections without regenerating the whole document. It serves both audiences (human readers, AI agents) without needing two versions of the same information.

What I'm actually thinking about

The template is the lightweight part. The more interesting part is the questions behind it.

As AI accelerates execution and one person takes on more, I think the constraint increasingly becomes the human context window. Not writing the code, but keeping track of everything across workstreams. Having a single page where I can see the architecture of a project without reconstructing it from memory has been more valuable than I expected.

And that opens up some directions I keep coming back to:

What if this auto-updated on every commit? A git hook diffs the changes, maps them to sections, and the doc stays current without anyone prompting it.

What if it replaced tools like Swagger or TypeDoc instead of sitting alongside them? The AI is already reading your codebase to fill sections. It can generate API references in the same file, next to the architectural context those tools never capture.

Is HTML even the right format long-term, or just the right format for now? It's self-rendering and structured enough for agents to parse. But if other tools need to consume the data downstream, maybe the source should be structured data with HTML as one view.

I don't have answers to these yet. But I think the questions are worth thinking about, and I wanted to put this out there in case anyone else has been working through the same stuff.

Repo: github.com/katrinalaszlo/human-docs

Open example.html to see what a filled-in doc looks like.

I repurposed Karpathy's LLM Wiki for product discovery. It worked surprisingly well.

Kat Laszlo — Mon, 27 Apr 2026 17:33:02 +0000

I was playing with Karpathy's LLM Wiki and realized it could be re-applied to my manual workflow as a PM.

Normally I identify quotes from transcripts, create user stories, group them into features, and prioritize based on effort, impact, dependencies. It's tedious and error-prone, especially across 10+ interviews.

I tried using a wiki instead of my manual process for customer interviews and it worked surprisingly well.

How it works

Before running it, you can edit the prompt to decide what's worth tagging and how user stories should be written. When you re-run it, it does not blindly overwrite your edits or duplicate prior work.

One piece I especially like is the ability to view the connections as a graph and drill down from a user story to the actual customer quotes behind it. And if you're using AI to code, you can feed that evidence in as context. It builds better when it understands why you're building something.

Try it now

The repo ships with 3 fictional transcripts and a pre-built wiki (3 customers, 2 stories, 2 features) so you can explore the output immediately. Open wiki/ in Obsidian to see the graph.

When you're ready to use your own data, drop transcripts into raw/ and ingest. Your data lives alongside the examples. Delete the example files whenever you want. They won't affect your wiki.

Who else this might be useful for

I built this for product discovery, but I imagine it could work for customer success, customer research, or design, anywhere you're trying to surface themes across qualitative data.

If you haven't used Claude Code or Codex before, happy to lend a hand. It's deceptively non-technical.

katrinalaszlo / buildnext-oss

Karpathy-style evidence wiki for product development. Turn customer signal into grounded user stories. No database, no hosting — just markdown and an LLM.

BuildNext

Turn raw customer signal into evidence-grounded user stories. No database, no hosting — just markdown and an LLM.

How it works

BuildNext is a Karpathy-style LLM wiki for product development. You give it raw customer signal (call transcripts, support tickets, notes). An AI agent reads, extracts, and synthesizes it into a structured knowledge base you can browse in Obsidian or query from any agent.

raw/              # paste transcripts here (input)
wiki/             # agent-maintained output
  customers/      # one page per customer, extracted quotes
  stories/        # synthesized user stories
  features/       # story groupings
  index.md        # catalog of all pages
  log.md          # what changed and when
CLAUDE.md         # schema — rules for how the agent operates
config.md         # internal speakers to filter, evidence tags

Three layers: raw (immutable input), wiki (LLM-maintained output), schema (rules).

Prerequisites

Git
An AI coding tool that reads CLAUDE.md — Claude Code, Cursor, Codex, or…

View on GitHub

What Claude Code stores on your machine (and how to see it)

Kat Laszlo — Tue, 31 Mar 2026 18:31:18 +0000

Claude Code keeps a lot of data in ~/.claude/ that most people never look at. I wanted to know what was there, so I built a scanner.

On my machine it found:

76 persistent memory files across 10 projects
4,445 session transcripts totaling 1.8GB
2.2GB total data footprint

The memory files are markdown with frontmatter, organized by type: what Claude thinks your role is, feedback you've given, project context, reference links. It remembers more than you'd expect.

The tool

npx agentlens scan

No API keys, no accounts. Reads local files only.

Scan commands

agentlens memory — what Claude remembers about you
agentlens costs — token usage by model and project
agentlens features — active feature flags on your account
agentlens sessions — transcript stats and tool usage
agentlens privacy — total data footprint

Action commands

agentlens clean --dry-run — preview which memories would be deleted
agentlens redact — find secrets that leaked into memory files
agentlens diff save — snapshot current state, then diff show to compare later
agentlens export — dump everything to portable JSON

What surprised me

The sensitivity scanner flagged 13 potential secrets in my memory files. Most were false positives (the word "token" in pricing discussions), but some were file paths and references I wouldn't want in a shared context.

Session transcripts contain everything: every file you read, every bash command you ran, every edit you made. If you've ever read a .env file during a Claude Code session, it's in there.

Stack

~900 lines of TypeScript. Node 18+. Dependencies: chalk, commander, glob, yaml. MIT licensed.

GitHub: https://github.com/katrinalaszlo/agentlens

Would love to hear what you think!

Track Your AI Costs Per Customer in One API Call

Kat Laszlo — Wed, 25 Mar 2026 18:27:44 +0000

If you're building on top of AI APIs, you're probably calling OpenAI, Anthropic, Cohere, and a few others depending on the task. Each one bills differently. Costs shift when model prices change. And at some point, someone asks: "Are we making money on this customer?"

Without instrumentation, that question takes a day to answer. With it, it takes a query.

The problem

You ship a feature. It works. Customers use it. Then pricing changes, or a customer's usage spikes, or you realize token costs for one feature are eating your margin on that plan tier.

The issue isn't that you don't have cost data. Your AI provider invoices you every month. The issue is that data isn't broken down by your customers or your features. You see what you spent. You don't see what each customer cost you.

To know your margin per customer, you need to capture cost at the moment the AI call happens, with context attached.

One event call

After each AI API response, send one event to Tanso:

await fetch('http://localhost:8080/api/v1/client/events', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk_test_67d9fb04f0344036ba92ecc973f1445a'
  },
  body: JSON.stringify({
    eventIdempotencyKey: crypto.randomUUID(),
    eventName: 'chat_completion',
    customerReferenceId: 'cus_123',
    featureKey: 'ai_summarization',
    costInput: {
      model: 'gpt-4o',
      modelProvider: 'openai',
    },
    usageUnits: response.usage.total_tokens,
    costAmount: 0.05,
  })
})

A few things worth noting:

eventIdempotencyKey required. If the same event is sent twice (network retry, duplicate webhook), Tanso deduplicates silently. Use crypto.randomUUID() per call, or derive it from your own request ID if you need stable deduplication.

customerReferenceId your customer's ID, whatever you already use. Tanso maps this to their account.

featureKey ties the event to a specific feature on the customer's plan. This is how Tanso separates cost for ai_summarization from document_export from chat_completion.

costAmount dollars, not cents. Pass 0.05, not 5 for 5 cents.

costInput model and provider metadata. Used for cost attribution when you want Tanso to calculate costs from usage rather than passing them explicitly.

That's it. No batch jobs. No ETL. No scraping provider invoices.

What you get

Once events are flowing, Tanso aggregates them in real time by customer and feature.

Margin by customer. You know what cus_123 costs you this billing period, broken down by feature. Compare that to what they're paying. That's your margin.

Margin by feature. If ai_summarization on your Starter plan is consistently underwater, you see it before it shows up as a bad quarter.

Usage against plan limits. Tanso tracks usageUnits against the feature's limit on the customer's plan. The entitlement check API (POST /api/v1/client/entitlements/check) returns current usage and limit in real time, so you can gate access before a customer blows past their quota.

Automatic Stripe sync. Events flow through to Stripe Billing Meters and land on the customer's next invoice. You don't manage the billing side separately.

Works from anywhere

You don't need a dedicated service to send events. Tanso's event API is an HTTP POST. That means it works from:

Your backend add the call after any AI API response
Your terminal curl for one-off testing or backfilling
An AI agent Tanso exposes an MCP server, so Claude Code and other agents can instrument their own AI calls natively, or query cost data while they work

curl -X POST http://localhost:8080/api/v1/client/events \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk_test_67d9fb04f0344036ba92ecc973f1445a" \
  -d '{
    "eventIdempotencyKey": "550e8400-e29b-41d4-a716-446655440000",
    "eventName": "chat_completion",
    "customerReferenceId": "cus_123",
    "featureKey": "ai_summarization",
    "costInput": {
      "model": "gpt-4o",
      "modelProvider": "openai"
    },
    "usageUnits": 1847,
    "costAmount": 0.05
  }'

Get started

Get an API key from the Tanso dashboard
Configure a feature on your plan (e.g., ai_summarization) with the expected pricing model
Add the event call after each AI API response in your code
Check the dashboard, cost and usage data appears immediately

If you want to verify an event landed correctly, the dashboard shows raw event history per customer. You can also check a customer's current entitlement state at any time via the entitlement API.

Tanso is built for teams shipping on top of multiple AI APIs who need to know their economics at the customer level, not just at the invoice level. The event API is the foundation. Everything else, entitlement checks, metered billing, plan enforcement, runs on the same data.

Would love to hear if this is helpful or not. Happy to chat!

The Docs:

tanso-core.readme.io

We built an MCP server so AI coding tools can set up your entire billing system

Kat Laszlo — Wed, 18 Mar 2026 21:55:09 +0000

We built an MCP server so AI coding tools can set up your entire billing system

Every AI and API company we've talked to ends up building the same internal system. Redis counters tracking credits. Cron jobs reconciling usage. If-statements scattered across the codebase gating feature access. It starts as a quick hack when you launch your first pricing tier and becomes a permanent headache every time pricing changes.

Billing platforms like Stripe handle what happens after usage. They meter events, generate invoices, process payments. But nothing in that stack answers the question that matters at runtime: should this request be allowed to run?

Does this customer have enough credits? Are they within their plan limits? Is this feature included in their subscription? That logic is still on your engineering team.

That's what we built Tanso for.

One API call before the request runs. Subscription state, usage limits, and credit balance checked simultaneously. Allow or deny. If denied, no compute runs. No cost incurred. No surprise invoice.

// Check before running compute
const { isAllowed } = await tanso.entitlements.evaluate({
  customerReferenceId: "cust_482",
  featureKey: "ai-analysis",
  usage: { usageUnits: 1 }
});

// Denied? No compute runs. No cost incurred.
if (!isAllowed) return res.status(403);

// Safe to run. This request is billable.
const result = await openai.chat.completions.create({ ... });

The MCP server

We shipped an MCP server with 34 tools so AI coding tools can configure the whole thing. Claude Code, Cursor, VS Code, Windsurf, and ChatGPT can:

Create plans with graduated pricing tiers
Link features with usage limits and cost models
Onboard customers and create subscriptions
Check entitlements and ingest usage events
Set up model-aware cost tracking for LLM spend

No dashboard clicking required. Point your coding agent at the MCP and describe what you want.

MCP config:

{
  "mcpServers": {
    "tanso": {
      "url": "https://api.tansohq.com/mcp",
      "headers": {
        "X-API-Key": "sk_live_your_api_key_here"
      }
    }
  }
}

We also have AI-native docs so coding agents can read the full API in context: https://tansohq.com/llms-mcp.txt

How it fits with Stripe

Tanso plugs into your existing Stripe setup. You define plans and features in Tanso. Invoices push to Stripe. Payment status flows back automatically. Your payment processor still handles transactions. Tanso controls the pricing logic in your product.

What we're seeing

Every team we talk to has some version of this held together with internal tools. The pattern is always the same:

Ship a simple pricing tier
Hardcode some limits
Pricing changes, nothing updates
Customers exceed limits, nobody notices until the invoice
Spend weeks rebuilding

If that sounds familiar, we'd love to hear what you ended up building. What broke first?

Self-serve is live with a free tier at tansohq.com.