As developers, we’ve all been there: a client asks you to build a system to capture leads from incoming emails, WhatsApp messages, or a generic "Contact Us" text area.
You expect structured data, but what you actually get from users is this:
"Hi, I'm Mario Rossi from Milan. I need a quote. You can call me at 333 12 34 567. My company VAT is 12345678901. Thanks."
Good luck parsing that with Regex! 😅
Phone numbers have random spaces, names are mixed with cities, and validating the VAT number usually requires writing a custom Modulo 10 algorithm.
The Solution: AI + Mathematical Validation
I got tired of maintaining fragile regular expressions, so I decided to build a dedicated backend using Node.js, Express, and OpenAI's GPT-4o-mini.
The goal was simple: send raw text in, get a guaranteed clean JSON out.
Instead of just relying on the LLM to guess if a VAT number is valid, I built a hybrid system:
The AI extracts the entities (Name, Phone, City, VAT, Intent).
The Node.js backend processes the VAT passing it through the official mathematical Modulo 10 algorithm to check if it's legally formatted.
The phone number is automatically stripped of spaces and formatted with the international +39 prefix.
What the output looks like
If you send the messy text from the example above, the system returns this clean JSON:
json
{
"success": true,
"extracted_data": {
"person_name": "Mario Rossi",
"city": "Milan",
"phone": "+393331234567",
"vat_number": "12345678901",
"intent": "quote",
"is_vat_valid": false
}
}
(Notice how it automatically detected the VAT is fake because it failed the Modulo 10 math check!)
I made it available as an API
Since building the infrastructure, handling the OpenAI prompts for structured outputs, and hosting the server takes time, I wrapped the whole thing into a plug-and-play API.
If you are building a bot, automating leads with Zapier/n8n, or just handling messy inputs, you can use it right now.
👉 Smart Contact Extractor (Italian AI) on RapidAPI
https://rapidapi.com/x4v1er94/api/smart-contact-extractor-italian-ai
There is a free basic tier available, so you can test it directly in the RapidAPI playground without pulling out your credit card.
I also published a lighter, free-forever API just for strict validation (without the AI extraction part) if you already have structured forms: Italian Data Normalizer.
Let me know what you think in the comments! How do you currently handle unstructured leads in your projects?

Top comments (0)