Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Data Cleaning & Analysis Templates

#excel #productivity #automation #data

Data Cleaning & Analysis Templates

A collection of spreadsheet templates with built-in validation formulas, deduplication tools, data normalization functions, and pivot table templates. Clean messy data without writing code — just paste your data and let the formulas work.

Key Features

Validation Engine — 30+ formulas to detect invalid emails, phone numbers, dates, duplicates, and outliers
Deduplication Tools — exact match, fuzzy match, and configurable similarity threshold dedup
Data Normalization — standardize names, addresses, phone formats, dates, and currency values
Outlier Detection — IQR-based and Z-score methods with automatic flagging
Pivot Table Templates — 10 pre-built pivot configurations for common analysis patterns
Audit Trail — track what was cleaned, when, and what the original value was

What's Included

Sheet Tab	Description
Data Input	Paste raw data here — validation runs automatically
Validation Report	Summary of all detected issues by column and type
Dedup Tool	Duplicate finder with configurable match criteria
Normalizer	Side-by-side original vs. cleaned values
Outlier Analysis	Statistical outlier detection with visual flags
Pivot Templates	10 ready-to-use pivot table configurations
Formula Library	Reference sheet with all 30+ validation formulas
Config	Validation rules, thresholds, and format settings

Quick Start

Open the Data Input tab and paste your raw dataset (up to 50,000 rows)
Check the Validation Report tab — issues are flagged automatically with color coding
Use the Dedup Tool tab to find and mark duplicates
Run data through the Normalizer tab for standardized output
Review Outlier Analysis for statistical anomalies
Export the cleaned data from the Normalizer output columns

Formula Examples

Email Validation

=AND(
  ISERROR(FIND(" ", A2)),
  LEN(A2) - LEN(SUBSTITUTE(A2, "@", "")) = 1,
  FIND("@", A2) > 1,
  FIND(".", A2, FIND("@", A2)) > FIND("@", A2) + 1,
  LEN(A2) - FIND(".", A2, FIND("@", A2)) >= 2
)

Duplicate Detection (Exact Match)

=IF(COUNTIF($A$2:$A2, A2) > 1, "DUPLICATE", "UNIQUE")

Fuzzy Duplicate Detection (Similarity Score)

=1 - (ABS(LEN(A2)-LEN(B2)) + SUMPRODUCT((MID(UPPER(A2),ROW(INDIRECT("1:"&LEN(A2))),1)<>MID(UPPER(B2),ROW(INDIRECT("1:"&LEN(B2))),1))*1)) / MAX(LEN(A2),LEN(B2))

Phone Number Normalization (US Format)

=IF(LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""))=10,
  "("&LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),3)&") "&MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),4,3)&"-"&RIGHT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"-",""),"(",""),")","")," ",""),4),
  "INVALID"
)

Outlier Detection (IQR Method)

=IF(OR(
  A2 < QUARTILE($A$2:$A$1000,1) - 1.5*(QUARTILE($A$2:$A$1000,3)-QUARTILE($A$2:$A$1000,1)),
  A2 > QUARTILE($A$2:$A$1000,3) + 1.5*(QUARTILE($A$2:$A$1000,3)-QUARTILE($A$2:$A$1000,1))
), "OUTLIER", "OK")

Date Standardization (to YYYY-MM-DD)

=TEXT(DATEVALUE(SUBSTITUTE(SUBSTITUTE(A2,"/","-"),".","-")), "YYYY-MM-DD")

Name Case Normalization

=PROPER(TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " "))))

Whitespace Cleanup

=TRIM(CLEAN(SUBSTITUTE(A2, CHAR(160), " ")))

Removes leading/trailing spaces, non-printable characters, and non-breaking spaces

Numeric Extraction from Mixed Text

=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1))*ROW(INDIRECT("1:"&LEN(A2))),0),ROW(INDIRECT("1:"&LEN(A2))))+1,1)*10^ROW(INDIRECT("1:"&LEN(A2)))/10)

Extracts "12345" from "Order #12345-A"

Validation Summary Count

=COUNTIF(ValidationReport!$C:$C, "INVALID")

Usage Tips

Color Coding: Red = invalid/must fix, Yellow = suspicious/review, Green = valid
Threshold Config: Adjust similarity thresholds in the Config tab (default: 85% for fuzzy match)
Large Datasets: For 10,000+ rows, copy formulas in batches of 1,000 to avoid slowdowns
Preserve Originals: The Normalizer always keeps original values in column A; cleaned values appear in column B
Custom Rules: Add your own validation regex patterns in the Config tab row 20+

Best Practices

Always keep a backup of your raw data before cleaning
Validate first, clean second — understand the scope of issues before transforming
Handle blanks explicitly — decide whether empty cells mean "unknown" or "not applicable"
Document your cleaning rules — use the Audit Trail tab to record decisions
Spot-check results — randomly sample 5% of cleaned rows to verify accuracy

This is 1 of 11 resources in the Spreadsheet Tools Pro toolkit. Get the complete [Data Cleaning & Analysis Templates] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Spreadsheet Tools Pro bundle (11 products) for $149 — save 30%.

Get the Complete Bundle →

DEV Community

Data Cleaning & Analysis Templates

Data Cleaning & Analysis Templates

Key Features

What's Included

Quick Start

Formula Examples

Email Validation

Duplicate Detection (Exact Match)

Fuzzy Duplicate Detection (Similarity Score)

Phone Number Normalization (US Format)

Outlier Detection (IQR Method)

Date Standardization (to YYYY-MM-DD)

Name Case Normalization

Whitespace Cleanup

Numeric Extraction from Mixed Text

Validation Summary Count

Usage Tips

Best Practices

Related Articles

Top comments (0)