I recently built a set of Python tools that automate data collection, extract company information, and generate summaries using AI.
Instead of spending hours manually collecting data, everything is processed automatically.
Here’s a quick breakdown of how it works 👇
I’m a Python developer focused on automation, data extraction, and building practical tools for business workflows.
Instead of writing simple scripts, I build solutions that actually save time — collecting company data, automating repetitive processes, and integrating AI into real tasks.
If you want to see real examples of what I build, you can check my work here:
👉 https://dibara512.github.io/my-site/
What I build
Here are the main types of solutions I work on:
- Web scraping tools (registries, directories, company databases)
- Browser automation (forms, dashboards, repetitive workflows)
- Excel and database processing tools
- AI-powered data analysis (LLMs, summaries, classification)
- Internal tools with GUI for teams
Real examples
1. Company data extraction from registries
I’ve built parsers for multiple national registries:
- Finland
- France (SIREN extraction)
- Belgium, Austria, UK, Poland
- Japan, Iceland
Typical workflow:
- Read company list from Excel
- Automatically search registry
- Match exact company names
- Extract registration numbers
- Export results back to Excel
This allows processing hundreds of companies in minutes instead of hours.
2. Advanced contact scraper
One of my tools focuses on extracting contact data from websites:
- Phone numbers (tel:, JSON-LD, raw HTML)
- Company websites
- Structured data
It includes filtering and validation, so the output is clean and ready to use.
3. AI-powered website analysis
I built a system that:
- Loads multiple pages from a website
- Extracts and aggregates content
- Generates summaries using AI (Groq API)
- Identifies business activity
This is especially useful when working with large datasets of unknown websites.
4. Full automation pipeline
One of the most advanced tools I developed combines everything:
- Multi-page company search (up to 20 pages)
- Matching results with Excel datasets
- Collecting website, phone, industry, DUNS
- AI-generated summaries
- Structured export to Excel
This replaces hours of manual research and data entry.
You can find similar solutions here:
👉 https://dibara512.github.io/my-site/
Example: extracting frequent keywords from company data
Sometimes I need to analyze large datasets of company names or descriptions.
Here’s a simple example:
from collections import Counter
import re
def top_word_frequencies(text, min_len=3, top_n=20):
text = text.lower()
tokens = re.findall(r"[a-z0-9]+(?:-[a-z0-9]+)*", text)
tokens = [t for t in tokens if len(t) >= min_len]
counts = Counter(tokens)
return counts.most_common(top_n)
text_data = """
Apple Inc Google LLC Microsoft Corporation Apple Google Apple
"""
top_words = top_word_frequencies(text_data)
for word, freq in top_words:
print(word, freq)
This approach helps me:
- identify patterns in company data
- generate better keywords
- improve search and matching logic Technologies I use
In most of my projects, I work with:
- Python
- Selenium / BeautifulSoup
- Excel (openpyxl, pandas)
- SQL (SQLite, Firebird)
- APIs (Groq, OpenStreetMap)
- Tkinter (for internal tools)
- How I approach projects
My workflow is simple:
- You describe the task
- I propose a practical solution
- I build a working tool
- You get a ready-to-use result
No unnecessary complexity — only tools that solve the problem.
Final thoughts
Automation is not about writing scripts — it’s about saving time and reducing manual work.
If you're working with:
- web scraping
- automation
- data collection
- AI workflows
feel free to explore my work here:
👉 https://dibara512.github.io/my-site/
Top comments (0)