ClawGear

Posted on May 10

35 ChatGPT Prompts for Data Scientists (From EDA to Executive Summary)

#datascience #chatgpt #productivity #ai

Data scientists are typically excellent at the technical parts of their job — the modeling, the statistical reasoning, the code. The part that often gets left behind is communication: explaining methodology to non-technical stakeholders, documenting what the analysis actually showed, writing up model cards, translating uncertainty into language a VP can act on.

ChatGPT can't run your models or interpret your data (and you shouldn't let it). But it can help you write the documentation, explanations, and stakeholder reports that turn good analysis into organizational impact.

These 35 prompts are built around the real workflow of data science — from exploration through deployment.

Data security note: Never paste real PII, customer data, proprietary model weights, or internal system details into ChatGPT. Use aggregated, anonymized, or synthetic data in all prompts.

1. Exploratory Data Analysis Communication

Prompt 1 — EDA Summary for Stakeholders

I've completed an exploratory data analysis on [describe dataset: rows, columns, domain]. Key observations: [list 4–5 findings: distribution shapes, missing data patterns, surprising correlations, notable outliers]. Write a non-technical EDA summary that explains what we found and what questions it raises. Audience: product manager or business stakeholder.

Prompt 2 — Data Quality Issue Report

I found data quality issues in our dataset. Issues: [list: missing values in X column at Y%, duplicates in Z, inconsistent formatting in W]. Write a data quality report that: describes each issue, estimates the business impact, recommends a treatment (impute / drop / flag), and flags which issues require upstream fixes.

Prompt 3 — Correlation Finding Explanation

I found a [strong / moderate / weak] correlation between [variable A] and [variable B] (r = X). Write an explanation for a business audience that: describes what this correlation means, what it does NOT mean (causation caveat), and what business hypothesis it supports or contradicts.

Prompt 4 — Outlier Report

I identified [N] statistical outliers in [variable/metric]. Here's what I know about them: [describe patterns — are they data errors, edge cases, or genuine extremes?]. Write a brief report explaining: what these outliers are, whether they should be included or excluded from analysis, and the business implication of that decision.

Prompt 5 — Feature Importance Narrative

My model's top 5 features by importance are: [list features and importance scores]. Write an explanation for a business audience that: translates each feature into a business concept, explains why it makes sense (or doesn't) that it's predictive, and raises any concerns about using these features in production.

2. Model Documentation

Prompt 6 — Model Card Draft

Write a model card for a [classification / regression / clustering] model I built to [describe prediction task]. Include sections for: model overview, intended use, performance metrics (placeholders for my actual numbers), limitations, bias and fairness considerations, and evaluation environment. Use the standard Model Card format.

Prompt 7 — Model Methodology Summary

Write a methodology section for a data science report on a [model type] model. Include: the problem framing (classification/regression/etc.), how I defined the target variable, feature engineering approach (describe generically), model selection rationale, and validation strategy. Audience: technical reviewer but not necessarily ML specialist.

Prompt 8 — Model Limitation Disclosure

Write a clear, honest disclosure of model limitations for a [fraud detection / churn prediction / recommendation / forecasting] model. The known limitations I've identified are: [list]. Format this as the "Limitations" section of a technical report — be specific and don't soften the issues.

Prompt 9 — A/B Test Result Write-Up

Write a summary of an A/B test result. Test: [describe what was tested]. Result: [describe outcome — statistical significance, effect size, confidence interval]. Audience: cross-functional team (product, engineering, marketing). Include: what we tested, what we found, statistical significance and what that means in plain English, business recommendation.

Prompt 10 — Experiment Design Document

Write an experiment design document for a proposed A/B test. We want to test: [describe hypothesis]. Include: objective, metric (primary + guardrail), null hypothesis, statistical power requirements (plain English explanation), minimum detectable effect, sample size rationale, and duration. Flag any design risks.

3. Stakeholder Reporting

Prompt 11 — Executive Dashboard Narrative

Write a 1-page executive narrative for these metrics: [list metrics with current values and trends]. For each metric: explain what it measures, whether the trend is positive or concerning, and what action (if any) is recommended. Audience: C-suite, no technical background. Avoid ML jargon.

Prompt 12 — Analysis Request Response

A stakeholder asked: "[paste the question they asked]." My analysis found: [describe what you found]. Write a professional response email that: answers their question directly, provides the key finding and evidence, acknowledges any limitations or caveats, and suggests a follow-up action or decision.

Prompt 13 — Monthly Data Review Slide Narrative

Write speaker notes for a monthly data review slide. The slide shows: [describe the chart or metric]. The story I want to tell: [describe the insight]. What happened, why it matters, and what we should do about it — in under 90 seconds of speaking time.

Prompt 14 — Forecast Explanation with Uncertainty

I need to communicate a forecast with uncertainty. My forecast: [value] with a [confidence interval / range]. The main drivers: [list]. Write an explanation for a business stakeholder that: states the forecast clearly, explains what the uncertainty means in practical terms, describes what conditions would make the high/low scenario more likely.

Prompt 15 — Recommendation from Analysis

Based on my analysis of [describe data/problem], I'm recommending [describe recommendation]. Write the recommendation section of a business report that: states the recommendation clearly, provides the 3 key supporting data points, acknowledges the main counterargument, and proposes a success metric for tracking outcome.

4. Python and SQL Code Support

Prompt 16 — Code Explanation

Explain what this Python/SQL code does, step by step: [paste sanitized code]. Focus on: the data transformation logic, any non-obvious operations, and any potential performance or correctness issues.

Prompt 17 — SQL Query Optimization

Review this SQL query for performance issues: [paste sanitized query]. The table has approximately [N] rows. Identify: any missing indices that might help, unnecessary operations, rewrite opportunities (e.g., window functions vs subqueries), and any correctness issues. Explain each suggestion.

Prompt 18 — Pandas Code Review

Review this pandas code for: correctness, performance (chained indexing, unnecessary copies, vectorization opportunities), and readability: [paste sanitized code]. Flag issues and suggest improvements. I'm using pandas [version].

Prompt 19 — Function Documentation

Write a docstring for this Python function: [paste sanitized function signature and logic]. Include: what it does, arguments (name, type, description), return value, exceptions raised, and one usage example. Use NumPy docstring format.

Prompt 20 — Unit Test Plan for a Data Pipeline

Generate a test plan (not the code) for a data pipeline that [describe what the pipeline does]. Include: what to test at each stage (input validation, transformation logic, output schema, edge cases), what sample data to use, and any integration test considerations.

5. Machine Learning Concepts and Communication

Prompt 21 — Model Comparison Explanation

Explain the trade-offs between [Model A: e.g., Logistic Regression] and [Model B: e.g., Gradient Boosting] for a [classification / regression] problem in our context: [describe data characteristics and requirements]. Audience: non-ML engineer or product manager. Focus on practical implications, not math.

Prompt 22 — Overfitting Explanation

Write a plain-English explanation of overfitting for a business stakeholder who asked why our model performs well in testing but poorly in production. Use an analogy. Keep it under 150 words. End with what we're doing to address it.

Prompt 23 — Bias and Fairness Audit Write-Up

Write a fairness and bias review section for a model that makes [describe prediction: e.g., loan approval / job screening / content recommendation]. I've analyzed performance across these groups: [list groups]. Findings: [describe any disparities]. Include: what we found, how concerning it is, and what mitigations we're applying or recommending.

Prompt 24 — Cross-Validation Explanation

Explain cross-validation to a product manager who is asking why our model's test set performance was "good" but the results in production are weaker. Keep it non-technical. Use an analogy if helpful. Then explain what we can do to get a more realistic estimate of production performance.

Prompt 25 — Feature Engineering Rationale

Write the "Feature Engineering" section of a technical report. The features I created are: [list features and how you derived them]. For each: explain the business or statistical rationale, any transformations applied (log, binning, encoding), and any concerns about data leakage or reproducibility.

6. Data Strategy and Communication

Prompt 26 — Data Request Template

Write a data access request template for our data team. The requester should specify: what data they need (table/columns), why they need it (use case), who will have access, how long they'll retain it, and any compliance considerations. Make it short enough that people actually fill it out.

Prompt 27 — Data Dictionary Entry

Write a data dictionary entry for [field name]. Include: description (what it represents), data type, possible values or range, source system, update frequency, business owner, and known quality issues. Format as a structured entry suitable for a Confluence or Notion database.

Prompt 28 — Analytics Roadmap Proposal

Write a 1-page analytics roadmap proposal for [team/business area]. The key analytical questions we need to answer: [list 4–5]. For each: describe the analysis needed, the data required, and the expected business value. Prioritize by impact. Audience: head of data or business leadership.

Prompt 29 — Data Incident Report

Write a data incident report for a situation where [describe: e.g., a pipeline broke and stale data was shown in dashboards / PII was exposed in a log / a metric definition changed without notice]. Include: what happened, when it was discovered, business impact, root cause, remediation steps taken, and what we're doing to prevent recurrence.

7. Career and Professional Development

Prompt 30 — Data Science Interview Prep

I'm preparing for a data science interview at a [company type / stage]. Generate 10 technical questions I should expect (ML concepts, statistics, SQL, Python) and 5 behavioral questions. For each, give me a framework for thinking through the answer — not a scripted response.

Prompt 31 — Learning Plan: [Skill]

I want to build proficiency in [skill: e.g., causal inference / MLOps / time series forecasting / NLP] over the next 3 months. My current level: [describe]. Create a learning plan with: week-by-week topics, specific resources (books, courses, papers), and a project to build by the end.

Prompt 32 — Portfolio Project Write-Up

Write a project description for my data science portfolio. Project: [describe what you built, the data, the methods, and the outcome]. Format: problem, approach, key results, tools used, what I learned. Length: 200–300 words. Tone: confident and specific — this is for technical recruiters and hiring managers.

Prompt 33 — Peer Code Review Comment

Help me write constructive code review feedback for a teammate's data science code. I noticed: [describe the issue — e.g., data leakage risk, wrong train/test split, inefficient pandas operation]. Write feedback that: explains the issue clearly, provides the rationale, suggests a fix, and is worded professionally (not condescending).

Prompt 34 — Tech Talk Abstract

Write a conference talk or internal tech talk abstract for a talk I want to give on [topic: describe what you built or learned]. Include: a title (under 10 words), a 100-word abstract, 3 key takeaways the audience will leave with, and the target audience (beginner / intermediate / advanced DS).

Prompt 35 — Feedback on Written Communication

I wrote this data science report / email / summary: [paste text]. Review it from the perspective of a business stakeholder who doesn't have a statistics background. What's unclear? What feels like jargon? What am I assuming they know that I should explain? Give me specific edits, not just general feedback.

Getting the Most From These Prompts

Bring your actual findings. "Summarize my EDA" with no context gets nothing. "Here are my 4 key findings, now write the stakeholder summary" gets something useful. ChatGPT can structure and communicate; it cannot analyze data it hasn't seen.

Use it for the translation layer. Your biggest leverage is in converting technical work into business language. That's where hours disappear, and where ChatGPT helps most.

Always verify code suggestions. ChatGPT code is a starting point. Review it for correctness, data leakage, and edge cases before running it on production data.

Don't feed it real data. Synthetic examples, anonymized schemas, and aggregated summaries are safe inputs. Real customer data is not.

Your Complete Data Scientist Prompt Toolkit

Want all 35 prompts organized by workflow — from EDA through executive reporting?

The ChatGPT Prompt Toolkit for Data Scientists includes:

All 35 prompts in a PDF and Notion dashboard
Fill-in-the-blank templates for model cards, A/B test write-ups, and stakeholder narratives
Bonus section: 10 prompts for ML engineers and analytics engineers
Prompt chaining guide: from raw findings to executive-ready report in 3 steps

Get the Data Scientist Prompt Toolkit — $14.99

Pull it up before your next stakeholder meeting.