<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matt Joshi</title>
    <description>The latest articles on DEV Community by Matt Joshi (@mattjoshi).</description>
    <link>https://dev.to/mattjoshi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3941349%2F4b1bfd9c-7e21-4ee2-aae7-3fb2c7d2829d.webp</url>
      <title>DEV Community: Matt Joshi</title>
      <link>https://dev.to/mattjoshi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattjoshi"/>
    <language>en</language>
    <item>
      <title>How I Used Python Fuzzy Matching to Detect Duplicate Content for SEO</title>
      <dc:creator>Matt Joshi</dc:creator>
      <pubDate>Wed, 03 Jun 2026 04:32:54 +0000</pubDate>
      <link>https://dev.to/mattjoshi/how-i-used-python-fuzzy-matching-to-detect-duplicate-content-for-seo-20ah</link>
      <guid>https://dev.to/mattjoshi/how-i-used-python-fuzzy-matching-to-detect-duplicate-content-for-seo-20ah</guid>
      <description>&lt;p&gt;Struggling with duplicate content across your site? I wrote a Python script that uses fuzzy matching to find near-duplicate pages. It's been a lifesaver for my SEO audits:&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
from difflib import SequenceMatcher&lt;br&gt;
import requests&lt;br&gt;
from bs4 import BeautifulSoup&lt;/p&gt;

&lt;p&gt;def get_page_text(url):&lt;br&gt;
    response = requests.get(url)&lt;br&gt;
    soup = BeautifulSoup(response.text, 'html.parser')&lt;br&gt;
    return soup.get_text()&lt;/p&gt;

&lt;p&gt;def similarity_ratio(text1, text2):&lt;br&gt;
    return SequenceMatcher(None, text1, text2).ratio()&lt;/p&gt;

&lt;p&gt;urls = ['&lt;a href="https://example.com/page1" rel="noopener noreferrer"&gt;https://example.com/page1&lt;/a&gt;', '&lt;a href="https://example.com/page2'" rel="noopener noreferrer"&gt;https://example.com/page2'&lt;/a&gt;]&lt;br&gt;
texts = [get_page_text(url) for url in urls]&lt;br&gt;
ratio = similarity_ratio(texts[0], texts[1])&lt;br&gt;
print(f'Similarity: {ratio:.2%}')&lt;/p&gt;

&lt;p&gt;if ratio &amp;gt; 0.8:&lt;br&gt;
    print('Warning: Possible duplicate content!')&lt;/p&gt;

&lt;p&gt;For more advanced analysis, tools like &lt;strong&gt;&lt;a href="https://serpspur.com" rel="noopener noreferrer"&gt;SERPSpur's&lt;/a&gt;&lt;/strong&gt; content audit feature can identify duplicates across large sites. But this script is great for quick checks. How do you handle duplicate content issues?&lt;/p&gt;

</description>
      <category>micropython</category>
      <category>programming</category>
      <category>aws</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Check Google Penalties Automatically Using SERPSpur API and Python Scripts</title>
      <dc:creator>Matt Joshi</dc:creator>
      <pubDate>Tue, 02 Jun 2026 11:29:59 +0000</pubDate>
      <link>https://dev.to/mattjoshi/how-to-check-google-penalties-automatically-using-serpspur-api-and-python-scripts-k7l</link>
      <guid>https://dev.to/mattjoshi/how-to-check-google-penalties-automatically-using-serpspur-api-and-python-scripts-k7l</guid>
      <description>&lt;p&gt;I recently noticed a sudden drop in organic traffic for one of my sites, and I suspected a search engine penalty. Instead of guessing, I used &lt;strong&gt;&lt;a href="https://serpspur.com/tool/search-engine-penalty-radar" rel="noopener noreferrer"&gt;SERPSpur’s Search Engine Penalty Radar&lt;/a&gt;&lt;/strong&gt; to check for indexing issues and blacklist signals. Here’s how I integrated it into my monitoring workflow:&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
import requests&lt;/p&gt;

&lt;p&gt;url = '&lt;a href="https://serpspur.com/tool/search-engine-penalty-radar/" rel="noopener noreferrer"&gt;https://serpspur.com/tool/search-engine-penalty-radar/&lt;/a&gt;'&lt;br&gt;
params = {&lt;br&gt;
    'domain': 'example.com'&lt;br&gt;
}&lt;br&gt;
response = requests.get(url, params=params)&lt;br&gt;
data = response.json()&lt;br&gt;
if data['penalty_detected']:&lt;br&gt;
    print(f'Penalty found: {data["details"]}')&lt;br&gt;
else:&lt;br&gt;
    print('No penalty detected')&lt;/p&gt;

&lt;p&gt;It flagged a manual action from Google that I hadn’t noticed in Search Console. The tool also checks for blacklist signals from sources like Safe Browsing and Spamhaus. Since adding this to my weekly checks, I’ve caught two potential issues before they impacted rankings. Highly recommend for anyone managing multiple sites. #python #seo #monitoring&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nee2nnqpru2mygp9fuv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nee2nnqpru2mygp9fuv.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>python</category>
      <category>elasticsearch</category>
    </item>
  </channel>
</rss>
