<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Saverio Bertocci</title>
    <description>The latest articles on DEV Community by Saverio Bertocci (@x4v1er94).</description>
    <link>https://dev.to/x4v1er94</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3827334%2Fd22b58cc-86b2-4384-a86c-e2a9dc70768f.jpg</url>
      <title>DEV Community: Saverio Bertocci</title>
      <link>https://dev.to/x4v1er94</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/x4v1er94"/>
    <language>en</language>
    <item>
      <title>Stop using Regex for E-commerce scraping. I built an AI API that normalizes product data instantly.</title>
      <dc:creator>Saverio Bertocci</dc:creator>
      <pubDate>Tue, 17 Mar 2026 13:32:21 +0000</pubDate>
      <link>https://dev.to/x4v1er94/stop-using-regex-for-e-commerce-scraping-i-built-an-ai-api-that-normalizes-product-data-instantly-2467</link>
      <guid>https://dev.to/x4v1er94/stop-using-regex-for-e-commerce-scraping-i-built-an-ai-api-that-normalizes-product-data-instantly-2467</guid>
      <description>&lt;p&gt;If you've ever built a scraper, a dropshipping importer, or a PIM (Product Information Management) system, you know the absolute nightmare of dealing with unstructured product data.&lt;/p&gt;

&lt;p&gt;You scrape a supplier's website expecting a clean table with sizes and colors, but instead, you get this raw text string:&lt;/p&gt;

&lt;p&gt;"Nike Air Max mens sneakers size 42 blue synthetic material"&lt;/p&gt;

&lt;p&gt;Or even worse, it's in a foreign language:&lt;/p&gt;

&lt;p&gt;"Zapatillas de running Nike Air Max uomo blu taglia 42"&lt;/p&gt;

&lt;p&gt;The old way: The Regex Nightmare ❌&lt;br&gt;
Historically, we had to write dozens of regular expressions to catch variations of "Size", "SZ", "Taglia", or map 50 different color names to a standard English list. One typo from the supplier, and the script breaks. Your Shopify catalog ends up with weird tags like Color: blu scuro impermeabile.&lt;/p&gt;

&lt;p&gt;The new way: Structured AI Outputs ✅&lt;br&gt;
I got tired of fixing broken parsers, so I built a dedicated backend using Node.js, Express, and GPT-4o-mini with strict JSON schemas.&lt;/p&gt;

&lt;p&gt;Instead of searching for keywords, the LLM reads the context, translates everything to standard English, and maps it to specific e-commerce attributes.&lt;/p&gt;

&lt;p&gt;If you send the messy text from above, the API returns this exact JSON structure:&lt;/p&gt;

&lt;p&gt;json&lt;br&gt;
{&lt;br&gt;
  "success": true,&lt;br&gt;
  "data": {&lt;br&gt;
    "brand": "Nike",&lt;br&gt;
    "model": "Air Max",&lt;br&gt;
    "category": "sneakers",&lt;br&gt;
    "gender": "men",&lt;br&gt;
    "size": "42",&lt;br&gt;
    "color": "blue",&lt;br&gt;
    "material": "synthetic",&lt;br&gt;
    "pack_size": null,&lt;br&gt;
    "normalized_title": "Nike Air Max sneakers men blue size 42"&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
I wrapped it into a public API&lt;br&gt;
Since building the prompt logic, handling LLM latency, and hosting the infrastructure takes a lot of time, I wrapped the whole logic into a plug-and-play API.&lt;/p&gt;

&lt;p&gt;If you are building an automated Shopify importer, doing local SEO catalogs, or just formatting messy supplier CSVs with Python or Zapier, you can use it right now.&lt;/p&gt;

&lt;p&gt;👉 Check out E-commerce Product Normalizer (AI) on RapidAPI&lt;br&gt;


&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://rapidapi.com/x4v1er94/api/e-commerce-product-data-normalizer-ai" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;rapidapi.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;There is a free tier available (50 calls/month) so you can test it directly in the RapidAPI playground without any commitment.&lt;/p&gt;

&lt;p&gt;I'd love to hear your feedback! How do you guys currently handle messy product feeds from clients or suppliers?&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>node</category>
      <category>api</category>
    </item>
    <item>
      <title>How to Extract Structured Contact Data from Messy Emails using AI (and Validate Italian VATs)</title>
      <dc:creator>Saverio Bertocci</dc:creator>
      <pubDate>Mon, 16 Mar 2026 16:18:26 +0000</pubDate>
      <link>https://dev.to/x4v1er94/how-to-extract-structured-contact-data-from-messy-emails-using-ai-and-validate-italian-vats-54f6</link>
      <guid>https://dev.to/x4v1er94/how-to-extract-structured-contact-data-from-messy-emails-using-ai-and-validate-italian-vats-54f6</guid>
      <description>&lt;p&gt;As developers, we’ve all been there: a client asks you to build a system to capture leads from incoming emails, WhatsApp messages, or a generic "Contact Us" text area.&lt;/p&gt;

&lt;p&gt;You expect structured data, but what you actually get from users is this:&lt;/p&gt;

&lt;p&gt;"Hi, I'm Mario Rossi from Milan. I need a quote. You can call me at 333 12 34 567. My company VAT is 12345678901. Thanks."&lt;/p&gt;

&lt;p&gt;Good luck parsing that with Regex! 😅&lt;br&gt;
Phone numbers have random spaces, names are mixed with cities, and validating the VAT number usually requires writing a custom Modulo 10 algorithm.&lt;/p&gt;

&lt;p&gt;The Solution: AI + Mathematical Validation&lt;br&gt;
I got tired of maintaining fragile regular expressions, so I decided to build a dedicated backend using Node.js, Express, and OpenAI's GPT-4o-mini.&lt;/p&gt;

&lt;p&gt;The goal was simple: send raw text in, get a guaranteed clean JSON out.&lt;/p&gt;

&lt;p&gt;Instead of just relying on the LLM to guess if a VAT number is valid, I built a hybrid system:&lt;/p&gt;

&lt;p&gt;The AI extracts the entities (Name, Phone, City, VAT, Intent).&lt;/p&gt;

&lt;p&gt;The Node.js backend processes the VAT passing it through the official mathematical Modulo 10 algorithm to check if it's legally formatted.&lt;/p&gt;

&lt;p&gt;The phone number is automatically stripped of spaces and formatted with the international +39 prefix.&lt;/p&gt;

&lt;p&gt;What the output looks like&lt;br&gt;
If you send the messy text from the example above, the system returns this clean JSON:&lt;/p&gt;

&lt;p&gt;json&lt;br&gt;
{&lt;br&gt;
  "success": true,&lt;br&gt;
  "extracted_data": {&lt;br&gt;
    "person_name": "Mario Rossi",&lt;br&gt;
    "city": "Milan",&lt;br&gt;
    "phone": "+393331234567",&lt;br&gt;
    "vat_number": "12345678901",&lt;br&gt;
    "intent": "quote",&lt;br&gt;
    "is_vat_valid": false &lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
(Notice how it automatically detected the VAT is fake because it failed the Modulo 10 math check!)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy94wcu77obz0uihrkw0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuy94wcu77obz0uihrkw0.png" alt=" " width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I made it available as an API&lt;br&gt;
Since building the infrastructure, handling the OpenAI prompts for structured outputs, and hosting the server takes time, I wrapped the whole thing into a plug-and-play API.&lt;/p&gt;

&lt;p&gt;If you are building a bot, automating leads with Zapier/n8n, or just handling messy inputs, you can use it right now.&lt;/p&gt;

&lt;p&gt;👉 Smart Contact Extractor (Italian AI) on RapidAPI &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://rapidapi.com/x4v1er94/api/smart-contact-extractor-italian-ai" rel="noopener noreferrer"&gt;https://rapidapi.com/x4v1er94/api/smart-contact-extractor-italian-ai&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is a free basic tier available, so you can test it directly in the RapidAPI playground without pulling out your credit card.&lt;/p&gt;

&lt;p&gt;I also published a lighter, free-forever API just for strict validation (without the AI extraction part) if you already have structured forms: Italian Data Normalizer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://rapidapi.com/x4v1er94/api/italian-data-normalizer" rel="noopener noreferrer"&gt;https://rapidapi.com/x4v1er94/api/italian-data-normalizer&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let me know what you think in the comments! How do you currently handle unstructured leads in your projects?&lt;/p&gt;

</description>
      <category>node</category>
      <category>javascript</category>
      <category>ai</category>
      <category>api</category>
    </item>
    <item>
      <title>Why Regex Is Never Enough for Italian Forms (And How to Fix It with an API)</title>
      <dc:creator>Saverio Bertocci</dc:creator>
      <pubDate>Mon, 16 Mar 2026 12:59:05 +0000</pubDate>
      <link>https://dev.to/x4v1er94/why-regex-is-never-enough-for-italian-forms-and-how-to-fix-it-with-an-api-4eg3</link>
      <guid>https://dev.to/x4v1er94/why-regex-is-never-enough-for-italian-forms-and-how-to-fix-it-with-an-api-4eg3</guid>
      <description>&lt;p&gt;If you've ever built a checkout form or a CRM for the Italian market, you know the struggle. &lt;/p&gt;

&lt;p&gt;You ask the user for a phone number, an address, or a VAT Number (Partita IVA), and you get a wild mix of formats. People write "v.le" instead of "Viale", add random spaces in their phone numbers, and type 10 digits for a VAT number instead of 11.&lt;/p&gt;

&lt;p&gt;The standard developer reaction is to write a massive Regex. But here is the problem: &lt;strong&gt;Regex is not enough.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Modulo 10" Problem
&lt;/h3&gt;

&lt;p&gt;For example, the Italian VAT Number (Partita IVA) is 11 digits long. A simple &lt;code&gt;/^[0-9]{11}$/&lt;/code&gt; regex will let any random string of 11 numbers pass.&lt;br&gt;
However, the Italian Revenue Agency uses the &lt;strong&gt;Luhn Algorithm (Modulo 10)&lt;/strong&gt; to validate VAT numbers. The 11th digit is actually a control character calculated mathematically from the first 10. &lt;/p&gt;

&lt;p&gt;If you don't validate it mathematically, your database will be filled with fake or mistyped VAT numbers.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Offload the dirty work
&lt;/h3&gt;

&lt;p&gt;I got tired of copy-pasting the Modulo 10 algorithm and address-cleaning functions into every new Node.js project. So, during the weekend, I decided to pack all these rules into a single micro-service.&lt;/p&gt;

&lt;p&gt;I built the &lt;strong&gt;Italian Data Normalizer API&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;It takes messy inputs like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
json
{
  "street": "v.le trastevere 10",
  "city": "ROMA",
  "province": "rm",
  "zip": "153"
}

And returns beautifully formatted data, calculating the Modulo 10 for VATs and cleaning the strings:

{
  "street": "Viale Trastevere, 10",
  "city": "Roma",
  "province": "RM",
  "zip": "00153"
}

Try it for free
Instead of keeping it private, I published it on RapidAPI. There is a free tier (100 requests/month) which is more than enough for testing or small projects.

You don't even have to write the fetch requests yourself. I made a tiny open-source wrapper in JavaScript.

Check out the wrapper on GitHub and grab your free API key from the README:
👉 https://github.com/x4v1er94/italian-data-utils-js.git

I'd love to hear your feedback. Try to break it with weird inputs and let me know if I missed any edge cases!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
