DEV Community: Takeshi Fuchi

I Built a Service That Actually Converts PDFs to Markdown Correctly

Takeshi Fuchi — Tue, 02 Jun 2026 05:25:07 +0000

Have you ever copy-pasted from a PDF only to get mangled line breaks, tables collapsed into a single line, formulas turned into gibberish, and figure captions floating somewhere completely wrong?

You want to summarize a PDF with an LLM, organize old papers in Notion, or dump internal docs into a knowledge base — the goal is simple. But the moment you hit "PDF text extraction," everything falls apart before you even start.

So I built pdfmd.net — upload a PDF, get back a properly structured Markdown file with headings, paragraphs, tables, LaTeX formulas, and figure references all intact.

"Why not just attach the PDF to GPT-5.5?"

Fair question. For a 1–2 page document, that works fine. Here's where the approaches differ:

	① Text Extraction Tools	② GPT-5.5 / Claude directly	③ pdfmd.net
How it works	Reads character coordinates from PDF internals	Attach PDF, ask "convert to Markdown"	Page PNG → tuned LLM → Markdown + images as ZIP
Speed / Cost	✅ Fast, free	❌ High-end models get expensive fast	✅ Cheap model by default, you choose
Structure preserved	❌ Tables collapse, formulas break, 2-column layout mixes	✅ Mostly, but varies with prompt	✅ Consistent every time
Long documents	✅ No page limit	❌ Quality drops past ~50 pages	✅ 200+ pages handled reliably
Images / figures	❌ Zero information	⚠️ Possible but you have to engineer it	✅ Extracted and linked in ZIP automatically
Batch processing	✅	❌ One file at a time, manually	✅

The real gap with "just using GPT-5.5" is that you have to re-engineer it every time:

Prompt required every time — "format tables in Markdown", "use LaTeX for formulas", "extract figures with filenames", "bundle as ZIP" — skip any of these and output quality varies unpredictably.
Long documents degrade — 50–200 page papers hit context limits, get cut off, or have visibly worse quality in the second half.
One file at a time — converting 10 papers means 10 upload-ask-copy cycles.
Cost adds up — processing large volumes through GPT-5.5 / Claude Sonnet class models gets expensive quickly.

How pdfmd works

For each page, pdfmd extracts two things:

PNG image (full-page render) → primary input to the Vision LLM
Text (via PyMuPDF) → hints to help the LLM read characters accurately

The Vision LLM uses the image as the ground truth for structure and the text as a supplement. This is why 2-column layouts don't bleed into each other, tables stay intact, and formulas come out as LaTeX — structure is understood visually, not guessed from character positions.

Figures and graphs are cropped from the page, saved as image files, and embedded in the Markdown as ![caption](./images/page3_fig1.png). The final output is a ZIP containing the .md file and all images.

On models and cost: pdfmd runs on your own API key. You choose the model; API costs go directly to the provider. The default is a Gemini Flash Lite-class model — heavily tuned prompts and pipeline squeeze high-quality output from cheap models. Switch to a more capable model anytime when precision matters. This — a tuned pipeline running on your key — is the fundamental difference from asking GPT-5.5 directly.

Just upload. A 200-page paper produces the same quality result every time, no prompt crafting required.

Seeing it in action

Complex formulas: raw extraction vs. pdfmd

Using a machine learning paper (Conditional GAN, arxiv:1906.05596) as the test case.

Eq. 1 — GAN minimax objective

Here's what the formula looks like in the PDF:

Text extraction shatters it across lines. pdfmd reconstructs it as complete LaTeX:

\\mathbb{E} (expectation), bold vector \\mathbf{x}, nested brackets — all recovered exactly. Text extraction can't tell min from a subscript or an equation number from a line break.

Eq. 9 — Piecewise (cases) function

In the PDF:

Text extraction leaves bare brackets with misaligned text. pdfmd uses \\begin{cases}:

Paste this into Obsidian, Notion, or a Jupyter notebook and the formula renders correctly.

Eq. 5 — Fraction + Frobenius norm

Compound expressions with fractions, multi-level subscripts, and norm notation:

\\frac{1}{WH}, subscript \\theta_G, Frobenius norm \\|\\cdot\\|_F^2, \\Omega_{256} with subscripts — all correct. Text extraction splits numerator and denominator onto separate lines and drops the norm delimiters entirely.

Tables: subscripts intact

Using an ADC datasheet (LTC2228/2227/2226) as the example. Electrical spec tables are full of subscripted symbols: VCM, VIH, IIN, IOUT.

The original PDF page:

Text extraction strips subscripts: VCM, VIH, IIN stay as plain text. pdfmd renders them all as proper LaTeX:

VCM → $V_{CM}$, VIH → $V_{IH}$, IIN → $I_{IN}$, IOUT = 0 → $I_{OUT} = 0$.

In a hardware spec, VCM and $V_{CM}$ mean the same thing to a human but are different strings to any downstream system — search, LLM context, RAG retrieval. The subscript isn't cosmetic; it's semantic.

Page breaks + figure interruptions: text reconnected

Using a 2-column academic paper from SIGMOD'18 (GPU parallel top-k algorithm). A paragraph spans two pages with a figure inserted mid-sentence.

The source PDF (two pages stitched):

Text extraction output:

The paragraph is mid-sentence when the page footer fires:

…This results in two bitonic sequences,
(S[0], ..S[l/2 −1]) and (S[l/2], ...S[l −1])
where all the elements in the first subsequence
are smaller than any element in the second
subsequence.
Research 15: Databases for Emerging Hardware     ← page footer
SIGMOD'18, June 10-15, 2018, Houston, TX, USA   ← session header
1558                                             ← page number
Phase  Step
1      1
2      1  2  3  4                                ← figure data
(a) Algorithm.
Unsorted Input
After Phase 1
…
In the second step, the same procedure           ← paragraph resumes here
is applied to both the subsequences…

pdfmd output:

…This results in two bitonic sequences, $(S[0], \\dots S[l/2 - 1])$ and
$(S[l/2], \\dots S[l - 1])$ where all the elements in the first subsequence
are smaller than any element in the second subsequence. In the second step,
the same procedure is applied to both the subsequences, resulting in four
bitonic sequences…

![Bitonic Sorting Network](./images/gputopk_sigmod18_page3_fig1.png)
**Figure 3:** Bitonic Sorting Network

Page footer, session header, page number, and raw figure data all stripped. Paragraph text flows continuously. Figure placed at the right position with a proper reference.

Complex multi-column Japanese layout

Multi-column Japanese documents — municipal newsletters, magazine spreads, textbooks mixing vertical and horizontal text — are the hardest layout class for text extraction. Columns bleed into each other, vertical text comes out one character per line, and decorative borders become garbage characters.

Here's a one-page municipal newsletter put through pdfmd:

PyMuPDF text extraction output:

まちだ市民大学ＨＡＴＳ
まちだ市民大学ＨＡＴＳ
まちだ市民大学ＨＡＴＳ      ← heading repeated 3× (three columns)
受講生募集
襖
鴬                           ← decorative characters garbled
横横横横横横横横横横横横横横  ← border lines as characters

対市内在住、在勤、在学の、原則、全回出席できる方
場陶芸に関する講座＝陶芸スタジオ（下小山田町）、

日９月１０日～１２月１７日の月曜日     ← different section mixed in
人間関係学～人間関係の多様性と向き合   ← reading order scrambled

玉                           ← vertical text: one char per line
川
学
園
子
ど
も
…

pdfmd output:

## 催し ご参加を

### 玉川学園子どもクラブ ころころ児童館

#### 【7月のわくわくWeek「水鉄砲合戦～広場決戦の巻」】
自分の水鉄砲を持ってきて参加できます。びしょぬれになるので、
着替えが必要な方はお持ち下さい。
*   **対** 小学生以上の方
*   **日** 7月23日(月)～8月3日(金)、いずれも午後3時30分～5時(雨天中止)

### まちだ市民大学HATS 2012年度後期講座 受講生募集

*   **対** 市内在住、在勤、在学の、原則、全回出席できる方
*   **申** 7月25日正午～8月24日に電話でイベントダイヤル(📞724・5656)へ。
*   **問** 生涯学習センター 📞728・0071 FAX728・0073

Vertical text reconstructed, 3-column layout correctly ordered, ## → ### → #### heading hierarchy preserved, phone numbers and labels intact.

Multilingual: works the same way

Because the approach is Vision LLM-based, language is not a special case. Here's a Chinese academic paper:

# CUDA 并行计算技术在情报信息研判中的应用

**摘要：** 文章在研究公安情报信息研判技术的基础上…

$$W_{ik} = \\frac{tf_{ik} \\log(N/n_k + 0.01)}{\\sqrt{\\sum_{k=1}^n [tf_{ik} \\log(N/n_k + 0.01)]^2}} \\tag{1}$$

Supported languages: English, Japanese, Simplified Chinese, Traditional Chinese, Spanish, French, Portuguese, Russian, German, Turkish, Korean, Italian, Dutch — and the UI is translated into all of them.

Figures and graphs described, not just extracted

For charts and graphs — content that can't be extracted as text — pdfmd saves the image and writes a caption describing what it shows:

![Boxplots of Dice scores](./images/page7_fig2.png)
**Fig. 5.** Boxplots of Dice scores for various anatomical structures for ANTs,
NiftyReg, and VoxelMorph. Structures are ordered by average ANTs Dice score.

Ask an LLM "what does Fig. 5 show?" later, and it can actually answer — because the description is in the Markdown alongside the image reference.

When to use it

Moving papers, contracts, or internal docs into Notion or Obsidian
Preprocessing PDFs as RAG source documents for LLM pipelines
Quoting from a PDF in a blog post or report — paste the Markdown directly
Cleaning up multilingual material before sending to DeepL or GPT

If you just need text search inside a PDF, a viewer does that fine. pdfmd is for when you need to do something with the Markdown afterward.

Try it

pdfmd.net — sign up and get 50 pages free. 1 point = 1 page. Upload a file, wait a moment, download the ZIP. That's it.

The API key is free too. pdfmd runs on your own API key, but Google AI Studio lets you generate a Gemini API key at no cost. The default model (Gemini Flash Lite class) runs within AI Studio's free tier — so those 50 signup pages are completely free end-to-end, API costs included. No credit card required anywhere.

If you've ever fed a badly-extracted PDF into an LLM and gotten back a confused or hallucinated answer, try running the same PDF through pdfmd first. The difference is usually immediate.

I Built a VS Code Extension from Scratch with Claude Code — and Got 10+ Downloads Within the First Hour of Publishing

Takeshi Fuchi — Fri, 29 May 2026 17:56:46 +0000

Introduction

"I just want to peek inside this zip file without actually extracting it." — Every developer has been there. But VS Code's default behavior is either showing raw binary or launching an external app. Unzip, check, delete the folder — that friction adds up. I built this extension to eliminate it.

The entire project was developed through conversation with Claude Code, starting from zero. I also wired in an ad-based monetization model. Then I published it to the VS Code Marketplace without any promotion whatsoever — and watched 10+ downloads roll in within the first hour.

Here's the full story.

What Is "Zip & Archive Viewer"?

Zip & Archive Viewer (publisher: Takeshi-Fuchi) is a VS Code extension that lets you browse and preview the contents of archive files directly in the editor — no extraction required.

Select a zip file and it immediately renders as a file tree. Mouse down on any filename in the list to preview its contents inline; release to close. No temp folders, no cleanup. For the job of "just checking what's inside," this is the shortest possible path.

Supported Formats

.zip / .7z / .tar / .tar.gz / .tgz / .tar.xz / .tar.bz2 / .tar.zst — 17 compression formats in total, including modern ones like Zstandard.

Key Features

File Tree View
Selecting an archive displays filenames, sizes, and timestamps in a collapsible tree.

Instant File Preview (No Extraction)
Hold the mouse button down on any filename to preview its content; release to dismiss. Shows the first N lines of text (configurable, default 20). Nothing is extracted to disk.

Image Preview (No Extraction)
JPG / PNG / GIF / WebP / BMP / SVG / ICO / TIFF / AVIF render inline inside the viewer. See exactly what's in the archive without touching your filesystem.

Markdown Preview (No Extraction)
Rendered Markdown is displayed directly from within the archive. Images referenced inside the archive are automatically resolved and embedded. Great for checking documentation in a release package.

Nested Archive Browsing
A zip inside a tar.gz? Click through and browse it. Still no extraction needed.

Password-Protected Archives
ZIP and 7Z password-protected archives are fully supported. Passwords are remembered for the session.

Selective Extraction
Once you've confirmed what you need, right-click to extract individual files or entire folders. Multi-select with checkboxes for batch extraction. Extraction is an intentional step — not the default.

Building It from Scratch with Claude Code

How It Started

One day I received a zip file, needed to quickly verify its contents, and found myself going through the whole unzip → check → delete cycle just for a 30-second confirmation. "This should just work inside VS Code," I thought.

I looked for existing extensions. Some were unmaintained, some only supported zip, some had clunky UIs. Close enough didn't cut it, so I decided to build one. The catch: I had never built a VS Code extension before. The Webview API, CustomEditorProvider, extension context lifecycle — unfamiliar territory. That's where Claude Code came in.

How Development Progressed

The workflow was straightforward. I described what I wanted — "show a tree view when a zip file is selected in VS Code, without extracting it" — and Claude Code generated the scaffolding. From there I layered in features: "add 7z support," "handle password-protected archives," "render Markdown from inside the archive."

The main file, src/extension.ts, is about 2,000 lines. Stream-based reading for each archive format, HTML generation for the Webview, preview logic, password dialogs, batch extraction — all in one file.

I wrote tests too: unit tests, integration tests against real archive files, and end-to-end tests with Playwright. When something broke, I'd pass the failure back — "this test is failing, find out why" — and get a precise fix. The debug cycle was noticeably faster than working alone.

Looking at the git history: from Initial commit through feature additions, bug fixes, and version bumps, the entire arc from zero to published was one developer in conversation with an AI.

What I Took Away from the Experience

The biggest win with Claude Code was not having to look up unfamiliar APIs. Things like using vscode.window.showInputBox() for a password prompt, or retainContextWhenHidden to preserve Webview state across tab switches — I could just describe what I wanted and get working code, without context-switching to docs.

Not everything worked on the first try. Images broke due to a Content Security Policy misconfiguration. The tar tree hierarchy got mangled at one point. Small bugs, real bugs. But the pattern of "here's the failing test, fix it" reliably produced accurate fixes. The time I spent debugging dropped significantly.

Monetizing a VS Code Extension with Ads

Before publishing, I added something unusual: designated ad space within the extension UI.

The README states it plainly:

This extension may display advertisements in designated advertising spaces within the interface. Revenue from advertisements is used for the development, maintenance, and improvement of the extension.

In-app advertising is standard in mobile apps and web services, but it's rare in developer tooling. The VS Code extension ecosystem is almost entirely donation-ware or subscription-based. I wanted to test whether an ad model could work here.

Some pushback is expected — developers tend to have low tolerance for ads in their tools. The constraints I set for myself:

Non-intrusive: Ads sit at the edge of the UI and don't interfere with archive browsing
Fully local: No file data is sent externally, stated explicitly in the privacy policy
Transparent: The README discloses ad presence before anyone installs

Whether the developer community accepts this or not is the experiment. The results will tell me something either way.

10+ Downloads in the First Hour — Without Any Promotion

I had zero expectations at launch. No tweets, no Reddit posts, no Product Hunt submission. I published to the VS Code Marketplace and that was it.

Within an hour, downloads had crossed 10.

I was genuinely surprised.

In hindsight, it makes sense. The VS Code Marketplace has a massive, active user base searching for extensions every day. When you search "zip viewer" or "archive viewer" as a new listing, you surface at the top. No SEO work, no ad spend — the marketplace's own search index does the distribution.

Launching a web service from scratch is a cold start problem. You need SEO, social posts, community engagement, and paid acquisition just to get visible. A marketplace is different. The audience is already there. The infrastructure connecting supply and demand already exists.

The App Store and Google Play have the same structure, but the VS Code Marketplace is narrower: everyone visiting is a developer looking for a tool. The match rate between "I want to preview zip contents in VS Code" and this extension is near-perfect. A real solution to a real need gets found automatically.

Where It Is Now

Current version: 2.1.2. It started at 1.0.0 with preview support for 5 ZIP/TAR formats, expanded to 17 formats with a major UI overhaul in 2.0.0, and added Markdown preview, image preview, and batch extraction in 2.1.0. The scope has grown considerably from what I originally had in mind.

How the ad model performs, how downloads trend, how the developer community responds — I'll write a follow-up when there's something worth reporting.

If you've ever caught yourself unzipping just to check a file and immediately deleting the folder, give it a try.

Search "Zip & Archive Viewer" by Takeshi-Fuchi on the VS Code Marketplace

Source code is available at https://github.com/t-fuchi/zip-viewer