Building Marksmith: lessons from making Markdown bearable in VS Code

#vscode #markdown #llm #showdev

I'll start with the moment that pushed me to build this.

I was editing a 1,200-line README. Spotted a typo in the preview pane. Scrolled the source to find it. Lost my place. Scrolled more. Pasted a table from a Google Sheet and got back a wall of tab characters. Closed VS Code in frustration and went to make coffee.

That week I started building Marksmith, a VS Code extension that tries to fix the small things that quietly drain you when writing Markdown. This is a retrospective: what I shipped, the parts that were harder than expected, and the security work that ended up being the largest chunk of the project.

The problem isn't Markdown — it's the workflow around it

The syntax is fine. The friction lives next to it:

Pasting a table from Excel or Sheets gives you tab-separated garbage
Pasting a URL means manually typing [text](url)
Editor and preview scroll independently, so a long file turns into needle-in-haystack
You have no idea how heavy your doc is until you paste it into Claude or GPT and watch the token count blow up

Each one is small. Together they're death by a thousand context switches.

Smart Paste: making the clipboard do the work

The first thing I built was clipboard interception. When you paste, Marksmith inspects the clipboard before VS Code's default handler runs:

async function handlePaste(clipboardText: string, editor: TextEditor) {
  // Tab-separated, multi-line → Excel/Sheets table
  if (isTabularData(clipboardText)) {
    return convertToMarkdownTable(clipboardText);
  }

  // URL pasted over a text selection → auto-link
  if (isURL(clipboardText) && !editor.selection.isEmpty) {
    const selected = editor.document.getText(editor.selection);
    return `[${selected}](${clipboardText})`;
  }

  return clipboardText;
}

The Excel-to-Markdown conversion is the one users mention most. It's not complicated — split by tabs, normalize column widths, generate the |---| separator. But the impact on writing flow is disproportionate to the implementation effort.

Bi-directional sync: scrolling that actually works

This one took a few iterations.

The first version synced scroll position by percentage. Useless for long files, because Markdown blocks have wildly different rendered heights (a one-line image reference is tiny in source, huge in preview).

The fix was source-mapping. When rendering Markdown to HTML, attach a data-line attribute to every top-level element pointing back to the source line:

<h2 data-line="42">Bi-directional sync</h2>
<p data-line="44">This one took a few iterations.</p>

Scroll handlers on both sides then walk the DOM and align based on those anchors. Clicking any element in the preview jumps the cursor to that source line. Not novel — most modern Markdown editors do something similar — but getting it smooth inside a VS Code webview took some debouncing to avoid feedback loops where each side's scroll event triggered the other infinitely.

Document X-Ray: knowing the cost before you paste

If you draft in Markdown and feed it to an LLM, you eventually hit context limits. Marksmith has a sidebar showing word count, readability score, and an estimated LLM token count.

The token estimate is a heuristic — running a real tokenizer in the extension host would be too heavy on every keystroke. Instead it uses a character-and-whitespace approximation calibrated against tiktoken output on a documentation corpus. It gets within ~5% on doc-style text, which is enough to catch "this prompt is too big" before you actually paste it somewhere.

I'd like to move this to a real tokenizer in a worker thread eventually. For now the heuristic is good enough.

The part I underestimated: security

I thought I was building an editor extension. I ended up spending more time on security than on features. Three classes of vulnerability mattered:

XSS in the webview. Markdown allows raw HTML, so a <script> tag in someone's README would execute in the webview context. I run all rendered HTML through DOMPurify with a strict allowlist, while preserving the elements I actually want (Mermaid SVG, syntax-highlighted spans):

const sanitized = DOMPurify.sanitize(html, {
  ALLOWED_TAGS: [...defaultTags, 'svg', 'path', 'g', 'rect', 'text'],
  ALLOWED_ATTR: [...defaultAttrs, 'data-line', 'viewBox', 'xmlns'],
  FORBID_TAGS: ['script', 'style', 'iframe'],
});

RCE through PDF export. Marksmith uses Puppeteer to render to PDF. In the first version I hadn't verified the Chromium sandbox was actually engaging on user machines. A crafted Markdown file with malicious HTML could, in theory, escape the renderer. I now explicitly verify sandbox mode at launch and fail closed if it isn't available.

SSRF in URL preview. Smart Paste optionally fetches the title of a URL you paste. Without filtering, a user could be tricked into pasting http://169.254.169.254/... (the AWS metadata endpoint) or http://192.168.1.1/admin, and the extension would dutifully fetch it. The fix is IP filtering before the request, plus a redirect cap:

async function safeFetch(url: string) {
  const { address } = await dns.lookup(new URL(url).hostname);
  if (isPrivateIP(address) || isLoopback(address)) {
    throw new Error('Blocked: private/loopback address');
  }
  return fetch(url, { redirect: 'manual', /* + cap follows */ });
}

None of this is novel security work. What surprised me was how much attack surface a "simple" Markdown extension exposes once it starts being helpful — fetching URLs, rendering arbitrary HTML, spawning headless browsers.

What I'd do differently

A few things in hindsight:

Threat-model first, features second. I retrofitted protections; should have designed for them from day one.
The token counter should have been a worker from the start. Moving it off the main thread later was awkward and broke a few assumptions in the sidebar UI.
Test on Windows earlier. Path handling for PDF export gave me grief that I could have caught much sooner.

Try it

Source: Github
Homepage: Marksmith
VSCode Marketplace: Marksmith on VS Code Marketplace
OpenVSX Marketplace: Marksmith on OpenVSX

If you've shipped a VS Code extension and have war stories about webview security or Puppeteer sandboxing, I'd genuinely like to hear them — drop a comment.