DEV Community

Cover image for A Better Approach to Data Cleaning in Large Datasets with Python

A Better Approach to Data Cleaning in Large Datasets with Python

Varun Gujarathi on December 06, 2023

As a software engineer and tech enthusiast, I recently embarked on a challenging yet rewarding project. My task involved working with a massive cri...
Collapse
 
viraj63 profile image
Viraj Vhatkar • Edited

I loved the approach of using DOB and then using a fuzzy approach but I would suggest using additional variables for better results.
Eg: if you have 2 individuals "Jame Dov" and "Jane Dov" but the algorithm would give both the same name, so in this case address, city, location too or more specific criteria could be given to avoid changing the names.
But still, I learned something new today.