DEV Community

Abdelrahman Adnan
Abdelrahman Adnan

Posted on

Module 2 Summary - Workflow Orchestration with Kestra Part 3

Part 3: AI Integration & Best Practices

Using AI for Data Engineering

AI tools help data engineers by:

  • Generating workflows faster - Describe tasks in natural language
  • Avoiding errors - Get syntax-correct code following best practices

Key Insight: AI is only as good as the context you provide.

Context Engineering with LLMs

Problem: Generic AI assistants (like ChatGPT without context) may produce:

  • Outdated plugin syntax
  • Incorrect property names
  • Hallucinated features that don't exist

Why? LLMs are trained on data up to a knowledge cutoff date and don't know about software updates.

Solution: Provide proper context to AI!

Kestra AI Copilot

Kestra's built-in AI Copilot is designed specifically for generating Kestra flows with:

  • Full context about latest plugins
  • Correct workflow syntax
  • Current best practices

Setup Requirements:

  1. Get Gemini API key from Google AI Studio
  2. Configure in docker-compose.yml with GEMINI_API_KEY
  3. Access via sparkle icon (✨) in Kestra UI

Retrieval Augmented Generation (RAG)

RAG is a technique that:

  1. Retrieves relevant information from data sources
  2. Augments the AI prompt with this context
  3. Generates responses grounded in real data

RAG Process in Kestra:

  1. Ingest documents (documentation, release notes)
  2. Create embeddings (vector representations)
  3. Store embeddings in KV Store or vector database
  4. Query with context at runtime
  5. Generate accurate, context-aware responses

RAG Best Practices:

  • Keep documents updated regularly
  • Chunk large documents appropriately
  • Test retrieval quality

Deployment & Production

For production deployment:

  • Deploy Kestra on Google Cloud
  • Sync workflows from Git repository
  • Use Secrets and KV Store for sensitive data
  • Never commit API keys to Git

Troubleshooting Tips

Issue Solution
Port conflict with pgAdmin Change Kestra port to 18080
CSV column mismatch in BigQuery Rerun entire execution including re-download
Container issues Stop, remove, and restart containers

Recommended Docker Images:

  • kestra/kestra:v1.1 (stable version)
  • postgres:18

Additional Resources


Key Takeaways

  1. Workflow orchestration is essential for managing complex data pipelines
  2. Kestra provides a flexible, scalable solution with YAML-based flows
  3. ETL is ideal for local processing; ELT leverages cloud computing power
  4. Scheduling and backfills enable automated and historical data processing
  5. AI Copilot accelerates workflow development with proper context
  6. RAG eliminates AI hallucinations by grounding responses in real data #dezoomcamp

Top comments (0)