<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hady Khaled Hammad</title>
    <description>The latest articles on DEV Community by Hady Khaled Hammad (@mrquite).</description>
    <link>https://dev.to/mrquite</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3594342%2F2531825b-b47f-4ae6-9e61-fa075d7366f5.jpg</url>
      <title>DEV Community: Hady Khaled Hammad</title>
      <link>https://dev.to/mrquite</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mrquite"/>
    <language>en</language>
    <item>
      <title>🧠 Smart Text Matching: RapidFuzz vs Difflib</title>
      <dc:creator>Hady Khaled Hammad</dc:creator>
      <pubDate>Mon, 03 Nov 2025 17:17:00 +0000</pubDate>
      <link>https://dev.to/mrquite/smart-text-matching-rapidfuzz-vs-difflib-ge5</link>
      <guid>https://dev.to/mrquite/smart-text-matching-rapidfuzz-vs-difflib-ge5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg15ql3f25imlhxpsr1m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg15ql3f25imlhxpsr1m.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
Have you ever typed a restaurant name into a food app — only to realize you misspelled it, used a local nickname, or added an extra space? Yet somehow, the app still finds what you meant.&lt;br&gt;
That’s fuzzy matching in action — a smart way of connecting almost-right text to the right result.&lt;/p&gt;

&lt;p&gt;In the era of big data and AI, matching text strings intelligently is a superpower. Whether deduplicating records, powering search engines, or normalizing messy user input, smart text matching goes beyond exact equality. It understands similarities despite typos, reorderings, or variations—turning chaos into clarity.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore how to build a simple but powerful fuzzy matching system for restaurant names, using two different Python approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: Using RapidFuzz for lightning-fast, token-based similarity&lt;/li&gt;
&lt;li&gt;Part 2: Using Difflib, Python’s built-in (but slower) alternative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll walk through real examples and see how both methods handle variations, typos, and multilingual data.&lt;/p&gt;


&lt;h2&gt;
  
  
  🍽️ The Scenario: Matching Restaurant Names
&lt;/h2&gt;

&lt;p&gt;Let’s say you have a list of restaurants from multiple regions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;restaurants = [
    {"name": "The Shawarma Professor"},
    {"name": "Dr. Falafel"},
    {"name": "Taco Republic (Mexican Grill)"},
    {"name": "مطعم الدكتور فلافل"},
    {"name": "Shawarma Prof."},
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a hungry user types:&lt;br&gt;
&lt;code&gt;query = "shawarma prof"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We want to find the closest matching restaurants, even if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They typed “prof” instead of “professor”&lt;/li&gt;
&lt;li&gt;The name has parentheses, prefixes, or translation&lt;/li&gt;
&lt;li&gt;It’s written in another language (like Arabic)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Part 1: ⚡ Using RapidFuzz for Fast and Accurate Matching
&lt;/h1&gt;

&lt;p&gt;Why RapidFuzz?&lt;br&gt;
&lt;a href="https://rapidfuzz.github.io/RapidFuzz/" rel="noopener noreferrer"&gt;RapidFuzz &lt;/a&gt;is a high-performance library built in C++.&lt;br&gt;
It's perfect for large-scale search tasks, offering token-based comparison, speed, and Unicode support - all essential when dealing with real-world restaurant names.&lt;br&gt;
A Quick Demo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from rapidfuzz import fuzz

score = fuzz.token_sort_ratio("The Shawarma Professor", "Shawarma Prof.")
print(score)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Output
72.22
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RapidFuzz recognizes that even with abbreviations and reordered words, the two names are nearly identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Matcher
&lt;/h2&gt;

&lt;p&gt;Let’s build a lightweight matcher using RapidFuzz:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import re
from rapidfuzz import fuzz

class RestaurantMatcher:

    def _normalize(self, text):
        text = ' '.join(text.strip().split())
        text = re.sub(r'[\u064B-\u065F\u0670]', '', text)  # remove Arabic diacritics
        return text

    def _calculate_similarity(self, a, b):
        return fuzz.token_sort_ratio(a, b)

    def match(self, query, restaurants):
        query_norm = self._normalize(query)
        results = []
        for r in restaurants:
            name = self._normalize(r["name"])
            score = self._calculate_similarity(query_norm, name)
            results.append((r["name"], score))
        return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's use it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if __name__ == "__main__":
    restaurants = [
        {"name": "The Shawarma Professor"},
        {"name": "Dr. Falafel"},
        {"name": "Taco Republic (Mexican Grill)"},
        {"name": "مطعم الدكتور فلافل"},
        {"name": "Shawarma Prof."},
    ]
    matcher = RestaurantMatcher()
    matches = matcher.match("shawarma prof", restaurants)
    if not matches:
        print("No matches found")
    else:
        for name, score in matches:
            print(f"{name}: {score:.1f}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The Shawarma Professor: 62.9
Dr. Falafel: 33.3
Taco Republic (Mexican Grill): 14.3
مطعم الدكتور فلافل: 6.5
Shawarma Prof.: 81.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ RapidFuzz successfully detects that "Shawarma Prof." and "The Shawarma Professor" are basically the same name - even with abbreviation and missing words.&lt;br&gt;
Note: There are also multiple ways to enhance the score&lt;br&gt;
You can use the following match function to filterout the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# do not forget to set you value here: self.HIGH_CONFIDENCE
def match(self, query, restaurants):
        query_norm = self._normalize(self._remove_titles(query))
        results = []
        for r in restaurants:
            name = self._normalize(self._remove_titles(r["name"]))
            score = self._calculate_similarity(query_norm, name)
            if score &amp;gt;= self.HIGH_CONFIDENCE:
                results.append((r["name"], score))
        return sorted(results, key=lambda x: x[1], reverse=True
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  RapidFuzz Advantages
&lt;/h2&gt;

&lt;p&gt;⚡ Super fast (C++ backend)&lt;br&gt;
🔤 Handles multilingual data like Arabic and English&lt;br&gt;
🔁 Understands reordered words&lt;br&gt;
🎯 Ideal for search engines, autocompletes, or recommendation systems&lt;/p&gt;
&lt;h1&gt;
  
  
  Part 2: 🐢 Difflib - The Simpler, Built-In Alternative
&lt;/h1&gt;

&lt;p&gt;Now let's assume we can't install external libraries - maybe our system runs on a restricted environment.&lt;br&gt;
 Enter Difflib, a standard Python module for comparing sequences.&lt;br&gt;
Let's swap the matching engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from difflib import SequenceMatcher

class RestaurantMatcher:

    def _normalize(self, text):
        text = ' '.join(text.strip().split())
        text = re.sub(r'[\u064B-\u065F\u0670]', '', text)  # remove Arabic diacritics
        return text

    def _calculate_similarity(self, a, b):
        return SequenceMatcher(None, a.lower(), b.lower()).ratio() * 100

    def match(self, query, restaurants):
        query_norm = self._normalize(query)
        results = []
        for r in restaurants:
            name = self._normalize(r["name"])
            score = self._calculate_similarity(query_norm, name)
            results.append((r["name"], score))
        return results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same code, different engine.&lt;br&gt;
Let's use it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if __name__ == "__main__":
    restaurants = [
        {"name": "The Shawarma Professor"},
        {"name": "Dr. Falafel"},
        {"name": "Taco Republic (Mexican Grill)"},
        {"name": "مطعم الدكتور فلافل"},
        {"name": "Shawarma Prof."},
    ]
    matcher = RestaurantMatcher2()
    matches = matcher.match("shawarma prof", restaurants)
    for name, score in matches:
        print(f"{name}: {score:.1f}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The Shawarma Professor: 74.3
Dr. Falafel: 25.0
Taco Republic (Mexican Grill): 14.3
مطعم الدكتور فلافل: 6.5
Shawarma Prof.: 96.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When Difflib Is Enough
&lt;/h2&gt;

&lt;p&gt;✅ Built into Python - no installation needed&lt;br&gt;
✅ Perfect for small datasets&lt;br&gt;
✅ Easier to debug and explain to beginners&lt;/p&gt;

&lt;p&gt;If your use case is simple - like comparing a few names or verifying duplicates - Difflib might be all you need.&lt;/p&gt;

&lt;h2&gt;
  
  
  In short:
&lt;/h2&gt;

&lt;p&gt;Use RapidFuzz if you need speed and accuracy&lt;br&gt;
Use Difflib if you need simplicity and zero dependencies&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 Conclusion: Smarter Search Starts with Fuzzy Matching
&lt;/h2&gt;

&lt;p&gt;Smart text matching levels up your data game, and RapidFuzz vs Difflib is the ultimate duel. RapidFuzz dominates with speed, token intelligence, and accuracy - ideal for demanding apps. Difflib holds the fort with zero-setup reliability for everyday tasks.&lt;/p&gt;

&lt;p&gt;i.e Fuzzy matching makes search feel effortless.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RapidFuzz gives you the power to scale - fast, accurate, and language-aware.&lt;/li&gt;
&lt;li&gt;Difflib gives you a lightweight fallback for simpler projects.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Which side are you on - RapidFuzz or Difflib? Drop your benchmarks in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>textmatching</category>
      <category>rapidfuzz</category>
    </item>
  </channel>
</rss>
