Six months ago, I started a self-directed data science program from scratch.
Month 1 was Excel. Month 2 was SQL. By Month 6, I was fitting Random Forest models and running t-tests in scipy. The problem? I kept jumping between 4 different browser tabs just to remember basic syntax.
So I did what any lazy engineer would do — I put everything into one PDF.
What's Inside
The cheat sheet bundle covers the 4 topics I use daily:
1. Pandas — Data Wrangling
Reading/writing (CSV, Excel, SQL, JSON), filtering, groupby, merge, pivot tables, handling nulls, datetime operations.
2. SQL — Queries, Joins & Aggregations
SELECT structure, WHERE operators, GROUP BY + HAVING, all JOIN types, CTEs, window functions (ROW_NUMBER, RANK, LAG/LEAD), data modification.
3. Scikit-learn — Machine Learning
Train/test split, pipelines, preprocessing (StandardScaler, OneHotEncoder, ColumnTransformer), regression/classification/clustering models, evaluation metrics (accuracy, precision, recall, F1, MAE, RMSE, R2), cross-validation, GridSearchCV, feature importance.
4. Statistics — Foundations & Inference
Descriptive stats, probability distributions, CLT, hypothesis testing, t-tests, ANOVA, chi-square, correlation (Pearson/Spearman), confidence intervals, Type I/II errors, key formulae.
Why a PDF?
I wanted something I could:
- Keep open on a second monitor while working
- Print and pin to my wall
- Search quickly with Ctrl+F
- Access offline
Where to Get It
I put the bundle on Gumroad for $12. If you're interested: https://deepanshu647.gumroad.com/l/shcxrw
Or if you just want the Pandas section as a free sample, comment below and I'll send it over.
I'm Deepanshu — BTech IT grad, self-taught data scientist, currently on Day 114 of a 12-month program. Building toward freelance data consulting and an MSc in the Netherlands.
Top comments (0)