DEV Community

Cover image for Case Study – Automating an ETL Pipeline with MCP
Om Shree
Om Shree

Posted on • Originally published at glama.ai

Case Study – Automating an ETL Pipeline with MCP

This case study demonstrates how Model Context Protocol (MCP) allows AI agents to automate complete ETL workflows, without manual scripting. By exposing data pipelines as structured tools, MCP enables agents to extract, transform, and load data simply by following natural language prompts. This approach reduces integration complexity and helps teams move from code-heavy pipelines to fully orchestrated, agent-driven automation.

Image

Real-World Example: Keboola MCP Server in Action

Keboola’s MCP server turns Keboola pipelines into AI-callable tools. Agents can manage storage, run SQL transformations, trigger jobs, and access metadata—all with natural language. For example, a prompt like “Segment customers with frequent purchases and run that job daily” launches a full ETL workflow with built-in logging and error handling 12.

# Example: initiating Keboola MCP Agent
from mcp_agent import MCPClient
client = MCPClient.create("url", server_url="https://mcp.eu.keboola.com/sse", auth_token="TOKEN")
Enter fullscreen mode Exit fullscreen mode

This remote connection supports SSE transport and OAuth authentication. The agent can call tools such as create_transformation, run_job, or list_jobs, with Keboola returning structured results in JSON 1.

Building a Pipeline with AI Prompts

Here is how a natural‑language pipeline prompt might look:

"Create a daily transformation that segments customers who spent over $100 last month. Then save results to a CSV and update the dashboard."
Enter fullscreen mode Exit fullscreen mode

Keboola’s MCP server interprets this, builds the SQL transformation, schedules the job, and monitors execution. Results and logs are returned as MCP responses, making monitoring and error tracking agentically accessible 2.

Multi-Platform ETL: Confluent + Keboola

For hybrid workflows, Keboola and Confluent MCP servers work together. Agents can fetch real-time Kafka topics via Confluent, then route cleaned data into Keboola for transformation and loading into a Delta Lake. Calls like list_topics, consume_message, and run_transformation integrate across platforms via standardized MCP interface 3.

# Agent orchestration with multiple MCP endpoints
from semantic_kernel.connectors.mcp import MCPSsePlugin
from semantic_kernel import Kernel

plugin1 = MCPSsePlugin(name="confluent", url="http://conf-mcp.local:9001")
plugin2 = MCPSsePlugin(name="keboola", url="https://mcp.eu.keboola.com/sse")

kernel = Kernel()
kernel.add_plugin(plugin1)
kernel.add_plugin(plugin2)
agent = kernel.create_chat_agent(service_id="openai", model_id="gpt-4")

response = agent.invoke_async("Ingest new Kafka events, transform with Keboola daily, and deliver summary as CSV")
print(response.content)
Enter fullscreen mode Exit fullscreen mode

This shows how a single agent orchestrates real-time ingestion and transformation across MCP-managed platforms 3.

Behind the Scenes

Image

Each tool exposed by the MCP servers is defined with metadata for name, description, input schema, and output format. When an agent calls a tool, the MCP server validates inputs, executes the operation in Keboola or Confluent, and returns structured responses.

Both Keboola and Confluent support async-first architectures, enabling concurrent agent workflows without blocking. Keboola supports HTTP+SSE or CLI transport (with uv), making it compatible with both desktop agents and cloud-based clients 14. Logs are tracked separately to maintain clean JSON output while providing auditability and observability.

My Thoughts

This ETL automation case shows how MCP can turn natural-language intent into reliable data operations. Agents can create pipelines, schedule jobs, fetch logs, and produce dashboards with clarity and repeatability. For teams working across domains, it removes engineering bottlenecks and lets agents do real data work.

That said, governance and control are essential. Limit write operations to reviewed tools. Validate SQL logic via pre-run checks. Use policy-based controls and log audits, especially in production environments. When implemented carefully, MCP delivers automation, safety, and speed in ETL workflows.

References


  1. Keboola MCP Server: AI-Powered ETL Workflow Automation Overview – Keboola Blog (link

  2. Keboola MCP Server Turns AI Agents into Data Engineers – SuperbCrew (link

  3. Powering AI Agents with Real-Time Data using MCP – Confluent Blog (link

  4. Keboola MCP Server Architecture and Best Practices – Keboola Blog (link

Top comments (5)

Collapse
 
thedeepseeker profile image
Anna kowoski

Nice! Hoping you publish more articles in this series.

Collapse
 
om_shree_0709 profile image
Om Shree

Thanks, Anna. Glad you liked it! I’ve currently covered all the latest advancements in this field, so I’ll start a new series and continue this one later.

Collapse
 
thedeepseeker profile image
Anna kowoski

Exciteddd!!!

Collapse
 
stonesurface profile image
Stone surface

Good App! Recommended!

Collapse
 
om_shree_0709 profile image
Om Shree

Thanks Sir, Glad you liked it!