Data Cleaning & Analysis Templates
A collection of spreadsheet templates with built-in validation formulas, deduplication tools, data normalization functions, and pivot table templates. Clean messy data without writing code — just paste your data and let the formulas work.
Key Features
- Validation Engine — 30+ formulas to detect invalid emails, phone numbers, dates, duplicates, and outliers
- Deduplication Tools — exact match, fuzzy match, and configurable similarity threshold dedup
- Data Normalization — standardize names, addresses, phone formats, dates, and currency values
- Outlier Detection — IQR-based and Z-score methods with automatic flagging
- Pivot Table Templates — 10 pre-built pivot configurations for common analysis patterns
- Audit Trail — track what was cleaned, when, and what the original value was
What's Included
| Sheet Tab | Description |
|---|---|
| Data Input | Paste raw data here — validation runs automatically |
| Validation Report | Summary of all detected issues by column and type |
| Dedup Tool | Duplicate finder with configurable match criteria |
| Normalizer | Side-by-side original vs. cleaned values |
| Outlier Analysis | Statistical outlier detection with visual flags |
| Pivot Templates | 10 ready-to-use pivot table configurations |
| Formula Library | Reference sheet with all 30+ validation formulas |
| Config | Validation rules, thresholds, and format settings |
Quick Start
- Open the Data Input tab and paste your raw dataset (up to 50,000 rows)
- Check the Validation Report tab — issues are flagged automatically with color coding
- Use the Dedup Tool tab to find and mark duplicates
- Run data through the Normalizer tab for standardized output
- Review Outlier Analysis for statistical anomalies
- Export the cleaned data from the Normalizer output columns
Formula Examples
Email Validation
=AND(
ISERROR(FIND(" ", A2)),
LEN(A2) - LEN(SUBSTITUTE(A2, "@", "")) = 1,
FIND("@", A2) > 1,
FIND(".", A2, FIND("@", A2)) > FIND("@", A2) + 1,
LEN(A2) - FIND(".", A2, FIND("@", A2)) >= 2
)
Duplicate Detection (Exact Match)
=IF(COUNTIF($A$2:$A2, A2) > 1, "DUPLICATE", "UNIQUE")
Fuzzy Duplicate Detection (Similarity Score)
=1 - (ABS(LEN(A2)-LEN(B2)) + SUMPRODUCT((MID(UPPER(A2),ROW(INDIRECT("1:"&LEN(A2))),1)<>MID(UPPER(B2),ROW(INDIRECT("1:"&LEN(B2))),1))*1)) / MAX(LEN(A2),LEN(B2))
Phone Number Normalization (US Format)
=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""))=10,
"("&LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),4,3)&"-"&RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),4),
"INVALID"
)
Outlier Detection (IQR Method)
=IF(OR(
A2 < QUARTILE($A$2:$A$1000,1) - 1.5*(QUARTILE($A$2:$A$1000,3)-QUARTILE($A$2:$A$1000,1)),
A2 > QUARTILE($A$2:$A$1000,3) + 1.5*(QUARTILE($A$2:$A$1000,3)-QUARTILE($A$2:$A$1000,1))
), "OUTLIER", "OK")
Date Standardization (to YYYY-MM-DD)
=TEXT(DATEVALUE(SUBSTITUTE(SUBSTITUTE(A2,"/","-"),".","-")), "YYYY-MM-DD")
Name Case Normalization
=PROPER(TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " "))))
Whitespace Cleanup
=TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " ")))
- Removes leading/trailing spaces, non-printable characters, and non-breaking spaces
Numeric Extraction from Mixed Text
=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1))*ROW(INDIRECT("1:"&LEN(A2))),0),ROW(INDIRECT("1:"&LEN(A2))))+1,1)*10^ROW(INDIRECT("1:"&LEN(A2)))/10)
- Extracts "12345" from "Order #12345-A"
Validation Summary Count
=COUNTIF(ValidationReport!$C:$C, "INVALID")
Usage Tips
- Color Coding: Red = invalid/must fix, Yellow = suspicious/review, Green = valid
- Threshold Config: Adjust similarity thresholds in the Config tab (default: 85% for fuzzy match)
- Large Datasets: For 10,000+ rows, copy formulas in batches of 1,000 to avoid slowdowns
- Preserve Originals: The Normalizer always keeps original values in column A; cleaned values appear in column B
- Custom Rules: Add your own validation regex patterns in the Config tab row 20+
Best Practices
- Always keep a backup of your raw data before cleaning
- Validate first, clean second — understand the scope of issues before transforming
- Handle blanks explicitly — decide whether empty cells mean "unknown" or "not applicable"
- Document your cleaning rules — use the Audit Trail tab to record decisions
- Spot-check results — randomly sample 5% of cleaned rows to verify accuracy
This is 1 of 11 resources in the Spreadsheet Tools Pro toolkit. Get the complete [Data Cleaning & Analysis Templates] with all files, templates, and documentation for $29.
Or grab the entire Spreadsheet Tools Pro bundle (11 products) for $149 — save 30%.
Top comments (0)