Recently I was working on my online tools website toolsonline.run and decided to build a Text Similarity Checker — a tool that compares two texts and shows you exactly how similar they are.
Why I Built This
I kept running into situations where I needed to compare two blocks of text:
- Writing: Checking if I accidentally repeated paragraphs in a long article
- SEO: Detecting duplicate content across pages
- Translation: Comparing different versions of a translated text
- Code review: Spotting similar code snippets
Most existing tools require registration, file uploads, or charge a fee. I wanted something simpler: paste two texts, get instant results, no strings attached.
How It Works
Paste your two texts into "Text A" and "Text B" fields, and the tool instantly calculates:
- Similarity percentage — based on the Sørensen–Dice coefficient
- Shared word count — how many identical words exist between the two texts
- Text overlap rate — the coverage of content overlap
Results update in real time as you type or edit.
The Technical Challenge: Chinese Tokenization
The biggest challenge was Chinese text segmentation. English splits on spaces, but Chinese has no natural word boundaries. I used Intl.Segmenter — a browser-native API that correctly handles tokenization for 15+ languages including Chinese, Japanese, Korean, and Thai. No backend needed.
The algorithm uses the Sørensen–Dice coefficient, a classic text similarity metric. It compares the intersection of word frequency distributions against their union, producing a 0–100% score.
Key Features
- Real-time processing — results update as you type
-
Multi-language support — works with CJK languages via
Intl.Segmenter - 100% client-side — no data sent to any server
- Free & no registration — just open and use
Try It Out
It's completely free, open-source friendly, and respects your privacy. All calculations happen locally in your browser.
If you find it useful, feel free to share it. Questions or suggestions? Drop a comment below!
Top comments (0)