Nav

Posted on Mar 26 • Originally published at promptzy.app

The 30 Best AI Prompts for Data Scientists and ML Engineers

#ai #productivity #machinelearning #datascience

If you spend your day in notebooks and dashboards, you already know the bottleneck isn't the analysis — it's all the writing around it. Explaining findings to stakeholders, documenting pipelines, writing model cards, interpreting messy A/B results. These 30 prompts are built for that exact friction.

All of them use {{clipboard}} as a stand-in for your data, code, or context. Paste them into ChatGPT, Claude, or Gemini. Most work equally well across all three.

Exploratory Data Analysis

1. First-look EDA brief

Perform a structured first-look review of this dataset: {{clipboard}}. Cover data shape, quality issues, distributions, correlations, and three critical questions I should answer before modeling.

2. Data quality audit

Audit this dataset for issues: {{clipboard}}. Flag missing values, outliers that suggest encoding errors, suspicious cardinality, date range problems, and any columns that look like duplicates or potential leakage.

3. Feature engineering suggestions

Based on this dataset schema: {{clipboard}} — suggest 5–10 feature engineering ideas. Include interaction terms, time-based features, encoding strategies for high-cardinality columns, and any domain-specific transformations you'd try first.

Model Explanation & Interpretation

4. Explain a model to a non-technical stakeholder

Translate this model description into plain English (max 200 words): {{clipboard}}. Cover what it predicts, what inputs it uses, how accurate it is, and how the business would actually use it. No jargon.

5. Interpret SHAP values

Here are SHAP values from my model: {{clipboard}}. Analyze the feature importance patterns, call out anything surprising, and recommend what I should investigate further.

6. Write a model card

Write a model card for this model: {{clipboard}}. Include: model overview, intended use cases, training data, performance metrics by subgroup, known limitations, and guidance on appropriate use.

SQL Query Generation & Review

7. Generate SQL from plain English

Write a SQL query that does this: {{clipboard}}. Use standard SQL. Add comments explaining each non-obvious step. Offer an alternative approach if one exists.

8. Review and optimize a SQL query

Review this query: {{clipboard}}. Identify logic errors, expensive joins, missing indexes, places where it could be simplified, and any edge cases it doesn't handle.

9. Translate SQL to pandas (or vice versa)

Convert this SQL to idiomatic pandas (or this pandas code to SQL): {{clipboard}}. Keep the logic identical. Add comments explaining each step.

Data Storytelling & Communication

10. Turn analysis results into an executive summary

Write a 3-paragraph executive summary from these analysis results: {{clipboard}}. Paragraph 1: the key finding. Paragraph 2: the data story behind it. Paragraph 3: the recommended action.

11. Write a hypothesis for an A/B test

Formalize this A/B test idea into a proper hypothesis: {{clipboard}}. Include: the proposed change, the metric being measured, expected direction and magnitude of effect, success criteria, and potential confounders.

12. Interpret A/B test results

Here are my A/B test results: {{clipboard}}. Assess statistical significance, practical significance, relevant caveats (sample size, segments, novelty effects), and make a shipping recommendation.

Statistical Interpretation

13. Explain a statistical concept in plain English

Explain {{clipboard}} to someone who understands basic statistics but not this specific technique. Use a concrete example. Minimize formulas. Focus on practical use cases.

14. Diagnose a regression model

Here is diagnostic output from my regression model: {{clipboard}}. Interpret the results, assess whether assumptions are being violated, and propose remediation steps.

15. Design an experiment

I want to test this hypothesis: {{clipboard}}. Recommend the appropriate statistical test, calculate the required sample size (80% power, p < 0.05), identify control variables, and define what success looks like.

Python Code Assistance

16. Debug a Python data error

Here is a Python error from my data pipeline: {{clipboard}}. Explain what's causing it, provide a fix, and identify any underlying data quality issues that might be the root cause.

17. Write unit tests for a data pipeline

Write pytest tests for this data pipeline function: {{clipboard}}. Cover normal inputs, edge cases (empty DataFrames, nulls, type mismatches), and assertions on business logic.

18. Refactor a messy data script

Refactor this script without changing its outputs: {{clipboard}}. Extract repeated logic into functions, use descriptive variable names, add inline comments, and organize into logical sections.

Reporting & Documentation

19. Write a data dictionary

Write a data dictionary for this schema: {{clipboard}}. For each column include: name, data type, plain-English description, example values, and any business rules or constraints that apply.

20. Summarize a research paper's methodology

Summarize the methodology section of this paper: {{clipboard}}. Cover what they did, what assumptions they made, where the approach could fail, and whether the conclusions actually follow from the method.

21. Write a stakeholder update on a data project

Write a brief stakeholder update (under 150 words, plain language) for this data project status: {{clipboard}}. Cover: what was planned vs. completed, any blockers, and next steps.

Machine Learning Engineering

22. Review a model training script

Review this training script for issues: {{clipboard}}. Check for data leakage, improper train/val/test splits, wrong loss function, missing reproducibility seeds, misleading metrics, and any bugs.

23. Write a problem statement for a classification task

Write a formal ML problem statement for this task: {{clipboard}}. Define the target variable, the classes, why this matters to the business, what metric we're optimizing, and what a reasonable baseline looks like.

24. Evaluate model fairness

Analyze this model's performance across demographic groups: {{clipboard}}. Identify which groups are most affected by performance disparities, determine likely causes, and recommend mitigations.

Exploration Prompts

25. Generate hypotheses from data patterns

Here are patterns I've noticed in this dataset: {{clipboard}}. Generate five testable hypotheses, what data each would require to test, and your confidence level in each.

26. Suggest ML approaches for a problem

Recommend 3–5 ML approaches for this problem: {{clipboard}}. Go from simple to complex. For each: pros/cons, appropriate metrics, and when you'd escalate to the more complex version.

27. Write a data pipeline design doc

Write a design document for this data pipeline: {{clipboard}}. Structure it as: problem statement, proposed solution, data flow, component responsibilities, error handling, monitoring, and open questions.

Miscellaneous

28. Clean up messy column names

Clean up these column names: {{clipboard}}. Apply snake_case, maintain descriptiveness, ensure consistency. Return a Python dictionary mapping old names to new names.

29. Write a README for a data project

Write a README for this data project: {{clipboard}}. Include: what it does, data sources, how to set up the environment, how to run the pipeline, where outputs go, and known limitations.

30. Post-mortem analysis of a failed model

Write a post-mortem for this model failure: {{clipboard}}. Cover what went wrong, what warning signs were missed, what we'd do differently, and what monitoring would catch this earlier next time.

How to actually use these

The real unlock isn't having these prompts once — it's having them available in two seconds whenever you need one. Copy-pasting from a doc, hunting through Notion, or re-typing from memory all break your flow.

If you're on a Mac, Promptzy stores prompts as local Markdown files and pastes them directly into any AI app with Cmd+Shift+P. No cloud sync required. The {{clipboard}} variable gets replaced with whatever you've copied before triggering.

Save this entire list as a Promptzy collection — one shortcut to fire any of them.

DEV Community