ClawGear

Posted on May 8

35 ChatGPT Prompts for Data Engineers: Pipeline Docs, Stakeholder Communication, and Data Quality Done Faster

#chatgpt #dataengineering #productivity #ai

Data engineers build the infrastructure that everything else depends on. You design pipelines, manage ingestion layers, optimize query performance, maintain data quality, and support a dozen stakeholders who all need their data yesterday—all while keeping up with a fast-moving ecosystem of tools and frameworks.

The technical work is the job. But the surrounding work—documentation, stakeholder updates, incident runbooks, code review comments, architecture decisions—takes more time than most data engineers expect, and most of it doesn't require your deepest thinking.

These 35 prompts are built for working data engineers. Not data analysts, not data scientists—data engineers doing the actual DE job: pipelines, platforms, data quality, and the organizational work that keeps it all running.

Pipeline Documentation

Prompt 1 — Document a data pipeline

Write technical documentation for a data pipeline. Pipeline name: [name]. What it does: [describe the purpose]. Source: [source system/table]. Transformations: [describe key steps]. Destination: [target system/table]. Schedule: [frequency]. SLA: [latency requirement]. Known limitations or edge cases: [describe]. Audience: other engineers who will maintain this pipeline. Include: overview, data flow diagram (ASCII), field mappings summary, failure modes, and on-call runbook.

Prompt 2 — Write a data dictionary entry

Write a data dictionary entry for the following table/dataset: [table name]. Schema: [paste DDL or field list with types]. Business context: [what does this table represent, who owns it, how is it used]. Key fields to explain: [list important columns]. Include: table purpose, field descriptions, business rules, known data quality issues, and which teams/systems read from this table.

Prompt 3 — Explain a complex SQL query

Explain this SQL query in plain English for a non-technical stakeholder: [paste query]. The context: [what business question is this query answering]. Include: what the query does step by step, what the output represents, any important filters or logic to highlight, and what the results should be used for.

Prompt 4 — Write a README for a data repo

Write a README for a data engineering repository. Repo purpose: [describe]. Tech stack: [tools/languages]. Key components: [list main folders or scripts]. How to run locally: [setup steps]. How pipelines are deployed: [describe]. Monitoring/alerting setup: [describe]. How to contribute: [conventions]. Who maintains it: [team/role]. Keep it practical—this is what onboarding engineers will read first.

Prompt 5 — Write an Architecture Decision Record (ADR)

Write an Architecture Decision Record for the following decision: [describe the decision]. Context: [what problem or constraint drove this decision]. Options considered: [list 2-3 alternatives with pros/cons each]. Decision made: [what was chosen and why]. Consequences: [what this decision enables, what it forecloses, and what tradeoffs were accepted]. Status: [proposed / accepted / superseded].

Data Quality and Observability

Prompt 6 — Write a data quality check spec

Write a data quality check specification for [table/pipeline]. Checks to implement: [freshness, completeness, uniqueness, referential integrity, value ranges, null rates — specify which apply]. For each check: name, description, SQL or pseudocode logic, expected threshold, and what action to take on failure (alert, quarantine, fail the pipeline). Target tool/framework: [Great Expectations / dbt tests / custom / other].

Prompt 7 — Write a data incident postmortem

Write a data incident postmortem. Incident: [describe what went wrong — data outage, bad data in production, SLA breach]. Timeline: [when discovered, when acknowledged, when resolved]. Root cause: [technical cause]. Impact: [who was affected, which dashboards/reports/systems, business impact if known]. What worked well: [describe]. What didn't: [describe]. Action items: [list specific remediation steps with owners]. Format for sharing with stakeholders and the data team.

Prompt 8 — Write a data quality SLA document

Write a data quality SLA for the following dataset: [dataset name]. Owner team: [team]. Consumers: [list downstream teams or systems]. Commitments: freshness SLA [X hours], completeness SLA [X%], availability SLA [X%]. Measurement: [how each metric is measured]. Breach escalation path: [who gets notified, at what threshold, how quickly]. Review cadence: [quarterly, etc.]. Format as a contract-style document for alignment between data producers and consumers.

Prompt 9 — Write a monitoring alert runbook

Write an on-call runbook for the following data monitoring alert: [alert name and description]. What the alert means: [what failure or anomaly triggered it]. First steps: [what to check first]. Diagnostic queries: [SQL or commands to run]. Common causes: [list 3-5 common root causes and how to identify each]. Resolution steps for each cause: [describe]. Escalation: [when to escalate and to whom]. Post-resolution: [steps after fixing, communication required].

Prompt 10 — Write a data contract

Draft a data contract between [producer team] and [consumer team] for [dataset name]. Cover: schema definition (field names, types, descriptions), freshness commitment, null/completeness guarantees, change notification process (how consumers will be warned of schema changes), breaking vs. non-breaking change definitions, and SLA breach escalation. Formal but readable — this will be signed off by both team leads.

Stakeholder Communication

Prompt 11 — Write a data platform roadmap update

Write a quarterly roadmap update for the data platform team. Audience: [data consumers, business stakeholders, or leadership]. Completed this quarter: [list]. In progress: [list with expected completion]. Planned next quarter: [list]. Known blockers or dependencies: [describe]. Keep it non-technical — stakeholders care about outcomes, not tools. Under 400 words.

Prompt 12 — Explain a pipeline delay to stakeholders

Write an email explaining a data pipeline delay to stakeholders. Pipeline: [name and what it feeds]. Expected vs. actual delivery: [how late]. Root cause: [technical explanation in plain English]. Impact: [which reports or decisions are affected]. Resolution time: [expected fix]. Interim workaround: [if any]. What's being done to prevent recurrence: [brief]. Professional, direct, no jargon.

Prompt 13 — Write a data migration communication

Write a stakeholder communication for an upcoming data migration. What's changing: [source, destination, or schema change]. Who is affected: [downstream teams, dashboards, reports]. Timeline: [key dates — announcement, cutover, deprecation]. What stakeholders need to do: [specific actions required before cutover]. Who to contact with questions: [name/team]. Keep it brief and action-focused — stakeholders need to know what to do, not why it's technically necessary.

Prompt 14 — Write a data request intake response

Write a professional response to the following stakeholder data request: [paste the request]. Assessment of the request: [can it be fulfilled as stated? Any clarifications needed? Complexity?]. What I can deliver: [describe the output]. What I need from the requester: [specific questions to answer]. Estimated timeline: [honest estimate]. Any limitations or caveats: [e.g., data not available for certain periods, approximation required]. Collaborative, not dismissive.

Prompt 15 — Write a data catalog entry

Write a business-friendly data catalog entry for the following asset: [table/dataset name]. What it is: [plain-English description]. How to use it: [key fields to query, common use cases]. How fresh is it: [update frequency]. Who owns it: [team]. Known limitations: [anything a consumer should know before using it]. Where to get access: [process or contact]. Audience: analysts and business users — avoid internal technical jargon.

Code Review and Technical Writing

Prompt 16 — Write a code review comment

Write professional code review comments for the following [SQL / Python / Spark / dbt] code: [paste code]. Focus areas: [correctness, performance, readability, maintainability, security]. For each issue: identify the specific line or section, explain the problem clearly, and suggest a concrete fix. Tone: constructive and educational, not critical. Include at least one positive observation if the code has strengths worth noting.

Prompt 17 — Explain a performance issue

Explain the following data pipeline performance problem to my team: [describe the issue — slow query, full table scans, skewed partitions, etc.]. Root cause: [technical explanation]. Impact: [query time, cost, downstream latency]. Fix applied or proposed: [describe]. How to detect this class of issue in the future: [monitoring approach]. Written for an engineering audience — technical but clear.

Prompt 18 — Write a design document for a new pipeline

Write a design document for a new data pipeline. Business requirement: [what downstream use case does this serve]. Proposed approach: [source, transformations, destination, schedule]. Tech stack: [tools]. Alternatives considered: [1-2 alternatives and why they were rejected]. Open questions: [unresolved decisions that need input]. Non-goals: [what this pipeline explicitly will not do]. Risks: [data quality, latency, cost, dependencies]. Format for team review and sign-off.

Prompt 19 — Write a dbt model documentation block

Write dbt model documentation for the following model: [model name]. What it represents: [business entity]. Source tables: [list]. Key transformations: [describe grain, filters, joins, aggregations]. Column descriptions: [paste schema or list columns needing description]. Who uses it: [downstream models or dashboards]. Write as YAML for schema.yml, following dbt documentation conventions. Include a model-level description and column-level descriptions for all key fields.

Prompt 20 — Review a data model for issues

Review the following data model for potential issues: [paste ERD, DDL, or table descriptions]. Identify: normalization issues, missing indexes or partition keys, potential performance bottlenecks, foreign key or referential integrity gaps, naming inconsistencies, and fields that may cause downstream confusion. For each issue: explain the problem and suggest a concrete improvement.

Incident Response and Operations

Prompt 21 — Write a pipeline failure alert message

Write a clear, actionable alert message for a data pipeline failure. Pipeline: [name]. Failure type: [describe — timeout, schema change, upstream data missing, etc.]. Time of failure: [timestamp]. Impact: [which downstream consumers are affected]. Immediate actions required: [list]. Who to page: [on-call rotation or team]. Keep it under 150 words — this goes to Slack/PagerDuty and needs to be scannable under pressure.

Prompt 22 — Write a root cause analysis template

Create a root cause analysis template for data engineering incidents. Include sections for: incident summary (5 lines max), timeline (discovery → acknowledgment → resolution), root cause (technical), contributing factors, impact (data freshness, consumers affected, business impact if known), what went well, what didn't, and action items (with owner and due date). Include guidance notes for each section so any engineer can fill it out.

Prompt 23 — Write a data deprecation notice

Write a deprecation notice for the following data asset: [table/pipeline/API name]. What's being deprecated: [description]. Why: [migration to new system, data quality issues, low usage, etc.]. Timeline: [deprecation date, support end date]. Migration path: [what replaces it and how to migrate]. Who to contact: [owner]. Impact assessment: [known consumers, required actions]. Send to: [data catalog, Slack channel, email to known consumers]. Professional and complete — deprecations cause real pain if poorly communicated.

Prompt 24 — Respond to a data quality escalation

Write a response to the following data quality escalation from a business stakeholder: [paste or describe their complaint]. My investigation findings: [what I found — the issue, its scope, when it started]. Root cause: [technical explanation in plain English]. Impact: [what data was affected and for how long]. Fix status: [resolved / in progress / workaround in place]. Prevention: [what we're doing so this doesn't recur]. Professional, takes ownership, and ends with clear next steps.

Career and Team Development

Prompt 25 — Write a data engineering onboarding guide

Write a 30-day onboarding guide for a new data engineer joining [team type — startup DE team, large platform team, etc.]. Include: first week focus (orientation, tool setup, codebase tour), second week (first small task, shadow on-call), week 3-4 (own first pipeline, understand data model). Key people to meet, key documentation to read, key questions to ask. Format as a practical checklist, not a reading list.

Prompt 26 — Write a technical interview take-home brief

Write a take-home technical assessment brief for a data engineering candidate. Role level: [mid / senior / staff]. Time limit: [2-4 hours]. Scenario: [describe the data problem — pipeline design, data modeling, or quality challenge]. What we're evaluating: [list the skills]. Deliverables: [what the candidate should submit]. Evaluation rubric: [how submissions will be scored]. Clear, realistic, and respectful of the candidate's time.

Prompt 27 — Write a promotion case for a data engineer

Write a promotion case document for a data engineer being considered for [current level → next level]. Their work: [list 2-3 major projects and contributions]. Business impact: [outcomes, not just outputs]. Evidence of next-level behavior: [specific examples]. Peer/stakeholder feedback themes: [summarize]. Areas for continued growth: [honest and constructive]. Format for an engineering promotion committee review.

Prompt 28 — Write a "what I learned" retrospective

Write a project retrospective for a data engineering initiative that recently completed. Project: [describe]. What went well: [list]. What was harder than expected: [list]. What I'd do differently: [list]. Key technical lessons: [specific learnings about tools, patterns, or architecture]. Team process lessons: [collaboration, communication, planning]. One recommendation for the next similar project. Format as a private document for personal reflection and team sharing.

Tools, Evaluation, and Strategy

Prompt 29 — Write a vendor evaluation framework

Write an evaluation framework for selecting a [data tool category — orchestration tool, data catalog, transformation layer, data warehouse, etc.]. Evaluation criteria: [list your requirements — scalability, cost, OSS vs. managed, integration with existing stack, support, vendor lock-in risk, etc.]. Scoring matrix: [weight each criterion]. Process: [how to run a proof of concept, who to involve, how to make the final decision]. Format as a decision document for team alignment.

Prompt 30 — Write a data platform cost optimization report

Write a cost optimization report for our data platform. Current spend areas: [list — compute, storage, egress, tooling licenses]. Identified inefficiencies: [describe specific findings — idle clusters, over-provisioned warehouses, unused tables, expensive queries]. Recommended actions: [list with estimated savings and effort level]. Quick wins vs. longer-term projects: [categorize]. Who owns each action: [assign]. Format for presenting to engineering leadership.

Prompt 31 — Write a "state of data" quarterly report

Write a "State of Data" quarterly report for [company / data team]. Audience: [executive team / data consumers / engineering leadership]. What we shipped: [list major pipelines, platform improvements, data products]. Data health metrics: [pipeline uptime, data freshness SLAs met, incidents]. Stakeholder satisfaction: [qualitative / quantitative]. What's coming next quarter: [priorities]. Ask from leadership: [resources, decisions, or unblocking needed]. Under 500 words, outcome-focused.

Prompt 32 — Explain a data architecture to non-technical leadership

Explain the following data architecture to a non-technical executive audience: [paste diagram description or architecture summary]. Translate into business terms: what it enables, why it's designed this way, what the key tradeoffs were, and what would happen if we didn't invest in it. Avoid acronyms and tool names unless necessary. The goal is to help leadership understand why data infrastructure investment matters, not to teach them engineering.

Professional Development

Prompt 33 — Write a conference talk proposal

Write a conference talk proposal for a data engineering conference (e.g., Data Council, dbt Coalesce, Current, Spark + AI Summit). Talk title: [idea]. Core story: [what problem, what we built or learned, what the audience takes away]. Why this audience should care: [relevance and novelty]. Speaker credentials: [your relevant experience]. Format: [talk length — 30 or 45 min]. Abstract (250 words), outline (3-5 key points), and speaker bio.

Prompt 34 — Summarize a technical paper or blog post

Summarize the following data engineering paper/blog post for sharing with my team: [paste URL or content]. Key takeaways: [3-5 most important points]. How it applies to our work: [specific relevance to our stack or problems]. What I'd want to investigate further: [follow-up questions or experiments]. Format as a brief Slack message or internal wiki post — enough to help teammates decide if they want to read the full piece.

Prompt 35 — Write a data engineering job description

Write a job description for a [mid-level / senior / staff] data engineer. Team context: [describe the team, stage of company, and data infrastructure]. Key responsibilities: [list what this person will actually do — not generic bullets]. Tech stack: [what they'll work with]. What makes this role interesting: [honest pitch for why a strong candidate should apply]. Must-haves vs. nice-to-haves: [differentiate clearly]. Avoid credential-inflating language — write for the candidate, not to filter them out.

Getting the Most From These Prompts

Include your stack. These prompts work best when you specify tools—Airflow vs. Dagster, dbt vs. custom SQL, Snowflake vs. BigQuery, Spark vs. DuckDB. Generic tool references produce generic output.

Paste real context. Paste your actual DDL, error messages, or stakeholder requests. The more specific the input, the more usable the output.

Use these for first drafts, not final drafts. Data documentation and stakeholder communication always benefit from your review. AI output is a fast starting point—your engineering judgment is what makes it accurate.

The Complete Data Engineer AI Toolkit

These 35 prompts cover the full data engineer workflow. If you want the complete system—advanced pipeline documentation templates, data quality frameworks, incident runbooks, stakeholder communication scripts, and architecture decision record templates organized by use case—the Data Engineer AI Toolkit has everything.

Get the Data Engineer AI Toolkit →

Bookmark this page. Share it with your data team. Use one prompt today—you'll spend less time on documentation and more time building.