Every data professional knows the feeling. You open a new project, and the first thing you need to do is clean, transform, or merge some data. And every time, you end up writing the same boilerplate code.
Load CSV. Drop duplicates. Handle missing values. Merge sheets. Save to Excel.
Sound familiar?
I got tired of rewriting these patterns across projects, so I built a set of reusable Python templates that handle the 10 most common data processing tasks.
What Is DataForge Pro?
DataForge Pro is a collection of 10 production-ready Python templates for data processing. Each template is a standalone script that you can copy, customize, and integrate into your workflow.
The 10 Templates
1. Quick Start — Load, Preview, and Save
The foundation template. Load any file (CSV, Excel, JSON), preview its structure, and save it in a different format.
from core import DataForge
df = (DataForge()
.load('data.csv')
.preview()
.save('output.xlsx'))
2. Data Cleaning
Remove duplicates, handle missing values, trim whitespace, and standardize column names.
df = (DataForge()
.load('messy_data.csv')
.remove_duplicates()
.drop_empty_rows()
.trim_whitespace()
.standardize_columns()
.save('clean_data.xlsx'))
3. Format Conversion
Convert between CSV, Excel (.xlsx/.xls), and JSON with a single line.
DataForge().load('data.csv').save('data.xlsx')
DataForge().load('data.xlsx').save('data.json')
4. VLOOKUP — Data Matching
The Excel VLOOKUP equivalent in Python. Match and merge data from two files using a common key column.
df = (DataForge()
.load('orders.csv')
.vlookup('customers.xlsx', 'CustomerID', ['Name', 'Email', 'City'])
.save('enriched_orders.xlsx'))
5. Pivot Tables
Create Excel-style pivot tables with group-by and aggregation functions.
df = (DataForge()
.load('sales.csv')
.pivot(group_by=['Region', 'Product'], agg={'Revenue': 'sum', 'Quantity': 'count'})
.save('pivot_report.xlsx'))
6. File Comparison
Find differences between two datasets.
diff = DataForge().compare('old_data.csv', 'new_data.csv')
diff.save_report('changes.xlsx')
7. Batch Processing
Process multiple files at once.
df = (DataForge()
.batch_load('data_folder/*.csv')
.remove_duplicates()
.save('combined_output.xlsx'))
8-10. Multi-Sheet Excel, CLI Mode, Extension Guide
Work with multiple sheets, use command-line interface, and learn to create custom transformations.
Key Features
- Chainable API — Clean, readable code
- Multiple Formats — CSV, Excel, JSON
- Well Documented — Clear docstrings and examples
- Zero Extra Dependencies — Only pandas, openpyxl, xlrd
Requirements
- Python 3.8+
- pandas, openpyxl, xlrd
Get DataForge Pro
Stop rewriting the same data processing code. Get 10 ready-to-use templates.
Also available on Gumroad.
Questions? Message me anytime. Happy coding!
Top comments (0)