Converting a PDF to a Word document is one of those tasks that sounds simple until you try to do it privately. Most converters upload your file to a server, process it, and send it back. That works, but it means trusting someone else with your document.
I wanted a converter that runs entirely in the browser. The result is en.sotool.top/pdf-to-word/. Here's how I built it.
The Goal
Extract selectable text from a PDF and package it into a .docx file, without ever sending the PDF to a server.
The scope is intentionally narrow:
- Text-only output
- No layout preservation
- No image extraction
- No OCR for scanned PDFs
This covers a lot of real use cases — contracts, reports, essays, meeting notes — while staying fast and private.
The Stack
- Vue 3 — UI and state management
- pdfjs-dist — Extract text from each PDF page
-
docx — Generate
.docxfiles in the browser - File API + Blob — Read input and trigger downloads
npm install pdfjs-dist docx
Loading the PDF
pdfjs-dist needs a worker. I point it to a CDN worker file to avoid bundling the large worker binary.
import * as pdfjs from 'pdfjs-dist'
pdfjs.GlobalWorkerOptions.workerSrc =
`https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.mjs`
Then load the document from a file:
async function extractText(file: File) {
const arrayBuffer = await file.arrayBuffer()
const pdf = await pdfjs.getDocument({ data: arrayBuffer }).promise
return pdf
}
Extracting Text Page by Page
pdfjs-dist gives you a TextItem array per page. I collect the text and split it into paragraphs.
const pdf = await extractText(file)
const paragraphs: string[] = []
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i)
const content = await page.getTextContent()
const text = content.items
.filter((item: any) => 'str' in item)
.map((item: any) => item.str)
.join(' ')
if (text.trim()) {
paragraphs.push(...text.split(/\n{2,}/).filter(p => p.trim()))
}
}
The result is an array of paragraph strings. We lose exact layout, but the text content is preserved.
Building the DOCX File
The docx library lets you create a Word-compatible document without a backend.
import { Document, Paragraph, Packer } from 'docx'
const doc = new Document({
sections: [{
properties: {},
children: paragraphs.map(text => new Paragraph({ text })),
}],
})
const blob = await Packer.toBlob(doc)
Packer.toBlob() returns a Blob that you can download with a simple anchor element.
Downloading the Result
function downloadBlob(blob: Blob, filename: string) {
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
a.download = filename
a.click()
URL.revokeObjectURL(url)
}
UI Considerations
Set expectations early. We show a clear message that conversion is text-only and that scanned PDFs won't work.
Preview first three pages. Users can see the extracted text before downloading, which builds trust and lets them catch problems early.
Affiliate guidance for complex needs. If a user needs layout preservation, images, or OCR, we recommend a desktop tool. We use a CJ Affiliate link for Wondershare PDFelement with rel="noopener sponsored".
Lessons Learned
Text extraction is easy; layout preservation is hard. Trying to keep columns, tables, and images in a pure browser tool quickly becomes a research project. Text-only is a pragmatic cutoff.
Scanned PDFs are the biggest support burden. Users expect any "PDF to Word" tool to handle scanned documents. We detect low or zero text content and show a specific message explaining the limitation.
Preview reduces disappointment. Letting users see the first few pages of extracted text before downloading prevents the "this output is broken" reaction.
Worker source matters. Bundling pdf.worker.js adds significant chunk size. Pointing to a CDN version keeps the initial bundle smaller.
Try It
The tool is live at en.sotool.top/pdf-to-word/.
Free, no signup, no upload. Full source is on GitHub.
Need Full Formatting?
For complex documents with tables, images, or scanned pages, a desktop tool is still the better option. Wondershare PDFelement converts PDFs to Word while preserving formatting and includes OCR.
This post contains affiliate links.
Have you built document conversion tools in the browser? What trade-offs did you make?
Top comments (0)