~ "Garbage In, Garbage Out": Bad data will lead to bad results, plain and simple.
~ It's hard for computers to judge whether the data makes sense or not.
~ To get accurate results, you need to remove errors from you data which confuses the algorithms.
~ It's time-consuming process but important.
What are the causes?
- Input Errors
- Duplicates
- Mangled Data
- Malfunctioning Sensors
- Lack of Standardization
Identifying Problems
- Range Constraints
- Data-Type
- Compulsory Constraints
- Unique Constraints
- Cross Field Constraints
Data Cleaning Techniques
- Removing missing data
- Direct correction
- Normalization
- Syntax errors
- Data Imputation
- Spell Check
- Filter Unwanted Outliers
- Remove Irrelevant Values
- Fix structural errors
Top comments (0)