DEV Community: Tayfun Yalcinkaya

Bölüm 2: Event Pipeline Tasarımı: Kafka’dan Lakehouse’a Gerçek Zamanlı Veri Yaşam Döngüsü

Tayfun Yalcinkaya — Thu, 04 Jun 2026 18:35:50 +0000

İlk yazıda Event Driven Architecture’ın temel kavramlarını, Kafka üzerinde topic/channel tasarımını, event-command ayrımını, schema contract’ları ve producer-consumer ilişkisini ele aldık. Bu yazıda odağı bir adım ileri taşıyıp event’in platform içindeki yaşam döngüsüne bakacağız.

Çünkü EDA tasarımında asıl zorluk yalnızca event üretmek değildir. Asıl mesele, üretilen event’in güvenilir, izlenebilir, tekrar işlenebilir, zenginleştirilebilir ve farklı tüketiciler tarafından kullanılabilir hale gelmesidir.

Bu yazıda şu sorulara odaklanacağız:

Ham event platforma geldiğinde ne olur?
Event nasıl doğrulanır, zenginleştirilir ve tüketilebilir hale gelir?
Raw, validated, enriched ve curated topic’ler nasıl konumlandırılmalıdır?
Bu yapı modern lakehouse mimarilerindeki Medallion yaklaşımıyla nasıl ilişkilendirilebilir?
DLQ ve alert topic’leri ne zaman devreye girer?
Replay, idempotency, monitoring, security ve governance nasıl düşünülmelidir?

Event Pipeline Nedir?

EDA mimarilerinde özellikle data platform projelerinde event’ler genellikle bir yaşam döngüsünden geçer.

Bu yaşam döngüsü şöyle modellenebilir:

raw -> validated -> enriched -> curated
                  |             |
                  v             v
                 dlq          alert

Bu yapı, veri akışının aşama aşama olgunlaşmasını sağlar.

Raw topic kaynaktan gelen ham event’i taşır. Validated topic schema ve temel kalite kontrollerinden geçmiş event’leri içerir. Enriched topic event’in referans veriler veya başka veri kaynaklarıyla zenginleştirilmiş halidir. Curated topic ise tüketiciler için güvenilir, normalize edilmiş ve iş anlamı netleşmiş event’leri temsil eder.

Event Pipeline ve Medallion Architecture İlişkisi

Bu yapı, modern lakehouse mimarilerinde sık kullanılan Medallion yaklaşımıyla doğal bir benzerlik taşır.

Lakehouse tarafında Bronze katmanı ham veriyi, Silver katmanı temizlenmiş ve zenginleştirilmiş veriyi, Gold katmanı ise iş tüketimine hazır veri ürünlerini temsil eder. Kafka üzerindeki raw, validated, enriched ve curated topic’leri de benzer bir olgunlaşma mantığını akan veri üzerinde uygular.

Ancak burada önemli bir ayrım vardır: Kafka tarafında bu aşamalar event channel olarak çalışır; lakehouse tarafında ise kalıcı, sorgulanabilir ve yönetişime tabi veri setlerine dönüşür.

Bunu şöyle düşünebiliriz:

Kafka raw topic       -> Lakehouse bronze table
Kafka enriched topic  -> Lakehouse silver table
Kafka curated topic   -> Lakehouse gold table

Bu birebir ve zorunlu bir eşleştirme değildir. Her topic’in lakehouse üzerinde ayrı bir tabloya dönüşmesi gerekmez. Ama mimari düşünce olarak iki taraf da verinin ham halden güvenilir ve tüketilebilir hale gelmesini hedefler.

Daha doğru ayrım şu şekildedir:

Data in motion: Kafka topics
Data at rest: Lakehouse tables

Kafka gerçek zamanlı akış, tüketici bağımsızlığı, fan-out, düşük gecikme ve replay kabiliyeti sağlar. Lakehouse ise kalıcı saklama, tarihsel analiz, SQL erişimi, BI/ML tüketimi, veri yönetişimi ve analitik veri ürünleri için konumlanır.

Raw Topic

Raw topic, kaynaktan gelen ham event’in yazıldığı ilk kanaldır.

raw.payment.transaction

Bu aşamada veri kaynaktan geldiği şekliyle saklanır. Format bozuk olabilir, eksik alanlar olabilir, beklenen schema ile tam uyumlu olmayabilir. Raw katmanı özellikle replay, audit ve troubleshooting için değerlidir.

Raw topic, lakehouse tarafında bronze katmana yazılacak event’lerin de kaynağı olabilir. Bu sayede streaming tarafındaki event akışı ile analitik taraftaki ham veri arşivi arasında bağ kurulmuş olur.

Validated Topic

Validated topic, schema ve temel veri kalite kontrollerinden geçmiş event’leri içerir.

validated.payment.transaction

Bu aşamada tipik kontroller şunlardır:

Zorunlu alanlar var mı?
Event schema’ya uygun mu?
Tarih, sayı ve para birimi formatları doğru mu?
Event ID var mı?
Event zamanı geçerli mi?
Duplicate kontrolü gerekiyor mu?

Raw topic’i tüketen bir validation-service, başarılı event’leri validated topic’e, hatalı event’leri ise DLQ topic’e yazabilir.

raw.payment.transaction
        |
        v
validation-service
        |
        +--> validated.payment.transaction
        +--> dlq.payment.transaction.validation

Validated topic, downstream işlemlerin ham ve güvenilmez veriyle doğrudan çalışmasını engeller.

Enriched Topic

Enriched topic, doğrulanmış event’in ek bilgilerle zenginleştirilmiş halidir.

enriched.payment.transaction

Örneğin raw event içinde sadece müşteri ID ve işlem bilgisi olabilir:

{
  "transaction_id": "T1001",
  "customer_id": "C789",
  "merchant_id": "M456",
  "amount": 950,
  "currency": "TRY"
}

Enrichment sonrası event’e müşteri segmenti, merchant category, ülke, risk skoru veya lokasyon bilgisi eklenebilir:

{
  "transaction_id": "T1001",
  "customer_id": "C789",
  "customer_segment": "premium",
  "merchant_id": "M456",
  "merchant_category": "electronics",
  "country": "TR",
  "amount": 950,
  "currency": "TRY",
  "risk_score": 42
}

Bu aşamada consumer artık sadece veri taşımaz; başka kaynaklarla join, lookup veya referans veri eşleştirmesi yapar.

Lakehouse perspektifinden enriched event’ler çoğu zaman silver katman için anlamlı bir girdidir. Çünkü veri artık ham halinden çıkmış, belirli kalite kontrollerinden geçmiş ve analitik açıdan daha kullanılabilir hale gelmiştir.

Curated Topic

Curated topic, tüketiciler için güvenilir, normalize edilmiş ve iş anlamı netleşmiş event’leri içerir.

curated.payment.transaction

Curated katman, çoğu downstream tüketicinin bağlanmasını isteyeceğimiz seviyedir. BI, dashboard, data lake ingestion, machine learning feature pipeline, notification ve operational analytics gibi tüketiciler çoğunlukla curated event’leri kullanmalıdır.

Lakehouse tarafında curated event’ler gold katmandaki veri ürünlerine kaynak olabilir. Örneğin operasyonel dashboard, müşteri segment analizi, fraud KPI’ları veya ödeme başarı oranı gibi metrikler curated event’lerden üretilebilir.

Alert Topic

Alert topic, alarm, fraud, threshold breach veya anomali gibi aksiyon gerektiren event’lerin yayınlandığı kanaldır.

alert.payment.fraud
alert.machine.temperature
alert.network.anomaly

Alert topic her zaman lineer pipeline’ın son adımı olmak zorunda değildir. Bazen validated, bazen enriched, bazen de curated topic’ten beslenen ayrı bir stream processing job tarafından üretilebilir.

Örneğin:

enriched.payment.transaction
        |
        v
fraud-detection-service
        |
        +--> alert.payment.fraud

Alert topic’leri operational reaction için tasarlanır. Bu nedenle retention, consumer SLA, retry ve notification entegrasyonları ayrıca düşünülmelidir.

DLQ Topic

DLQ, yani Dead Letter Queue, işlenemeyen veya hatalı event’lerin gönderildiği kanaldır.

dlq.payment.transaction.validation
dlq.payment.transaction.enrichment

DLQ sadece hatalı payload’ı değil, hata nedenini de taşımalıdır.

{
  "original_topic": "raw.payment.transaction",
  "original_partition": 3,
  "original_offset": 982133,
  "error_type": "VALIDATION_ERROR",
  "error_message": "customer_id is missing",
  "failed_at": "2026-05-13T12:30:00Z",
  "payload": {
    "transaction_id": "T1001",
    "amount": 950
  }
}

Bu yaklaşım, sorunlu kayıtların kaybolmasını engeller ve sonradan düzeltme, inceleme veya replay yapılmasına imkan sağlar.

DLQ tasarlanmayan EDA projelerinde hatalı event’ler ya sessizce kaybolur ya da consumer tarafında sürekli retry edilerek pipeline’ın tamamını yavaşlatır.

Her Aşama Ayrı Topic Olmak Zorunda mı?

Hayır. Raw, validated, enriched ve curated aşamalarının her biri fiziksel Kafka topic olmak zorunda değildir.

Üç temel yaklaşım vardır.

1. Stage-Based Pipeline

Her aşama ayrı bir topic olarak modellenir.

raw -> validated -> enriched -> curated

Bu model izlenebilirlik, replay, audit ve ekipler arası sorumluluk ayrımı için güçlüdür. Ancak topic sayısını ve operasyonel karmaşıklığı artırır.

2. Single Processor, Multi-Output

Tek bir stream processing job raw topic’i okur, validasyon, enrichment ve curation işlemlerini yapar, sadece sonuç topic’lerine yazar.

raw.payment.transaction
        |
        v
stream-processing-job
        |
        +--> curated.payment.transaction
        +--> alert.payment.fraud
        +--> dlq.payment.transaction

Bu model daha az topic ve daha düşük latency sağlar. Ancak ara aşamalar görünmediği için debug ve replay daha zor olabilir.

3. Branching Pipeline

Aynı topic’ten birden fazla consumer farklı amaçlarla beslenir.

validated.payment.transaction
        |
        +--> enrichment-service
        +--> audit-sink-service
        +--> fraud-service
        +--> realtime-dashboard-service

EDA’nın en güçlü taraflarından biri de budur: Aynı event, birden fazla bağımsız consumer tarafından farklı amaçlarla kullanılabilir.

Kafka ve Lakehouse Birlikte Nasıl Konumlanır?

İyi tasarlanmış bir EDA mimarisinde Kafka ve lakehouse birbirinin alternatifi değil, tamamlayıcısıdır.

Kafka şu amaçlarla kullanılır:

Event taşıma.
Low-latency processing.
Consumer fan-out.
Operational reaction.
Replay.
Sistemler arası gevşek bağlılık.

Lakehouse ise şu amaçlarla kullanılır:

Kalıcı saklama.
Tarihsel analiz.
SQL analytics.
BI ve dashboard.
Machine learning feature üretimi.
Governance, lineage ve veri ürünleri.

Örnek bir akış şöyle olabilir:

raw.payment.transaction
        |
        +--> bronze.payment_transaction_raw
        |
        v
validation-service
        |
        +--> validated.payment.transaction
        +--> dlq.payment.transaction
        |
        v
enrichment-service
        |
        +--> enriched.payment.transaction
        +--> silver.payment_transaction_enriched
        |
        v
curation-service
        |
        +--> curated.payment.transaction
        +--> gold.payment_transaction_analytics
        +--> alert.payment.fraud

Bu modelde Kafka data-in-motion tarafını, lakehouse ise data-at-rest tarafını yönetir.

Aynı Senaryonun Pipeline Yolculuğu: Streaming’den Arşiv ve Analitiğe

İlk yazıda enerji dağıtım alanındaki anonimleştirilmiş saha örneğini channel-based topic tasarımı açısından ele almıştık. Aynı senaryoya bu kez event pipeline perspektifinden bakalım.

Bu mimaride farklı kaynaklardan gelen XML tabanlı sayaç ve aktivite mesajları önce güvenli bir Kafka giriş katmanında karşılandı. Güvenlik ve operasyonel izolasyon ihtiyacı nedeniyle dış kaynaklardan gelen verinin doğrudan iç platforma alınması yerine, kontrollü bir ingress yaklaşımı tercih edildi. Bu tip yapılarda dış dünyaya açık veya yarı-açık veri kabul katmanı ile kurum içi event backbone’un ayrılması, hem güvenlik hem de operasyonel yönetilebilirlik açısından önemli bir avantaj sağlar.

Akışın sonraki adımında stream processing katmanı Kafka’dan gelen mesajları okudu, XML içerikleri parse etti, anlamlandırdı ve hedef sistemlere yönlendirdi. Operasyonel süreçler için veriler ilişkisel bir veritabanına aktarılırken, ham ve tarihsel analiz ihtiyacı için ayrı bir büyük veri/arşivleme katmanı beslendi. Bu sayede aynı event akışı hem günlük operasyonel ihtiyaçlara hem de geriye dönük sorgulama ve analitik senaryolara hizmet edebildi.

Bu yapı EDA ve lakehouse ilişkisinin sahadaki karşılığına iyi bir örnektir. Kafka tarafında akan event’ler gerçek zamanlı taşıma, ayrıştırma ve tüketici bağımsızlığı sağlarken; arşiv ve analitik katman tarafında ham verinin saklanması, geçmişe dönük sorgulanması, anomali tespiti ve tahminleme gibi senaryolar desteklendi. Modern lakehouse terminolojisiyle düşünürsek, ham sayaç mesajları bronze benzeri bir arşiv katmanına, parse edilmiş ve anlamlandırılmış veriler silver benzeri bir analitik katmana, operasyonel dashboard ve raporlama çıktıları ise gold benzeri tüketim katmanlarına karşılık gelir.

Kafka raw meter topics
        |
        +--> Ham arşiv / bronze benzeri katman
        |
        v
stream processing / parsing / enrichment
        |
        +--> Operasyonel veritabanı
        +--> Arama ve geriye dönük sorgulama katmanı
        +--> Analitik ve tahminleme katmanı
        +--> Dashboard ve alert mekanizmaları

Bu senaryoda Kafka retention süresi de mimarinin kritik parçalarından biriydi. Hedef sistemlerden biri yavaşladığında, kısa süreli erişilemez olduğunda veya bakım çalışması yapıldığında veri kaybı yaşanmaması için Kafka kontrollü bir dayanıklılık katmanı gibi çalıştı. Consumer’lar sistem tekrar sağlıklı hale geldiğinde kaldıkları offset’ten okumaya devam ederek akışı yeniden yakalayabildi. Bu, EDA’da retention ve replay tasarımının neden yalnızca teknik bir ayar değil, iş sürekliliği kararı olduğunu gösterir.

Operasyonel görünürlük de en az veri taşıma kadar önemliydi. Dağıtım şirketi ve veri tipi bazında son saniyelerde ve son saatlerde kaç mesaj geldiğinin izlenmesi, beklenen frekansta veri akışı olup olmadığının takip edilmesi ve akış kesintilerinde alert üretilmesi, platformun yalnızca çalışan değil, gözlemlenebilir bir sistem haline gelmesini sağladı.

Bu örnekteki temel çıkarım şudur: EDA, yalnızca sistemlerin Kafka’ya event yazması değildir. Güvenli veri kabul katmanı, topic tasarımı, stream processing, retention, replay, operasyonel monitoring, arşivleme ve analitik katman birlikte tasarlandığında gerçek anlamda kurumsal bir veri akışı mimarisi ortaya çıkar.

Böyle bir akışta parse edilemeyen XML mesajları, eksik sayaç bilgileri, beklenmeyen format değişiklikleri veya hedef sistem yazım hataları ana pipeline’ı durdurmamalıdır. Bu nedenle DLQ veya quarantine benzeri hata akışları, bu tip büyük ölçekli mimarilerde sonradan eklenen bir iyileştirme değil, tasarımın doğal bir parçası olarak düşünülmelidir.

Consumer Idempotency Neden Önemlidir?

Kafka tabanlı sistemlerde consumer’ların aynı event’i birden fazla kez işleyebilme ihtimali her zaman düşünülmelidir. Network hatası, retry, offset commit problemi veya uygulama restart durumlarında duplicate processing yaşanabilir.

Bu nedenle consumer tarafında idempotent tasarım yapılmalıdır.

Yani aynı event iki kez işlense bile sonuç değişmemelidir.

Örneğin ödeme event’i iki kez geldiyse müşteriye iki kez para iadesi yapılmamalıdır. Bunun için event_id, transaction_id veya business key üzerinden duplicate kontrolü yapılmalıdır.

Idempotency özellikle şu tüketicilerde kritiktir:

Finansal işlem yapan consumer’lar.
Bildirim gönderen servisler.
Operasyonel database’e yazan sink’ler.
Alert üreten sistemler.
Lakehouse tablolarına upsert veya merge yapan job’lar.

Replay Tasarımı

EDA’nın güçlü yanlarından biri, event’lerin belirli bir süre Kafka’da tutulması ve gerektiğinde tekrar okunabilmesidir.

Replay şu durumlarda gerekebilir:

Yeni bir consumer geçmiş event’leri baştan işlemek ister.
Hatalı bir enrichment logic’i düzeltilir ve event’ler yeniden işlenir.
Data lake’e eksik yazılan event’ler tekrar yüklenir.
Machine learning feature pipeline yeniden oluşturulur.
DLQ’deki kayıtlar düzeltildikten sonra tekrar akışa alınır.

Ancak replay güçlü olduğu kadar risklidir. Replay yapılırken downstream sistemlerin duplicate kayıt üretmemesi, finansal işlem gibi geri döndürülemez aksiyonların tekrar tetiklenmemesi gerekir.

Replay tasarımı için şu konular baştan düşünülmelidir:

Kafka retention süresi yeterli mi?
Raw event’ler lakehouse bronze katmanda arşivleniyor mu?
Consumer idempotent mi?
Replay ayrı consumer group ile mi yapılacak?
Replay edilen event’ler operasyonel aksiyonları tekrar tetikler mi?
Hangi topic’ler replay için güvenli kabul ediliyor?

Monitoring ve Operasyon

EDA projelerinde operasyonel görünürlük tasarımın bir parçası olmalıdır.

Takip edilmesi gereken temel metrikler:

Topic throughput.
Producer error rate.
Consumer lag.
Consumer processing time.
Failed event count.
DLQ event count.
Partition skew.
Broker disk kullanımı.
Replication durumu.
End-to-end latency.

Sadece Kafka cluster’ın ayakta olması yeterli değildir. Event’in kaynaktan çıkıp hedef tüketiciye kadar ne kadar sürede ulaştığı ve hangi aşamada beklediği izlenebilmelidir.

Bu görünürlüğü sağlayabilmek için monitoring tasarımı yalnızca altyapı metriklerine bırakılmamalıdır. Event’in kendisi de izlenebilir olmalıdır. Bunun için event payload veya event metadata içinde standart bazı alanların taşınması gerekir.

Özellikle şu alanlar event’in platform içindeki yolculuğunu takip etmek için kritik hale gelir:

event_id          -> Event’in benzersiz kimliği
event_type        -> Event tipi
event_time        -> Kaynak sistemde event’in oluştuğu zaman
source_system     -> Event’i üreten kaynak sistem
correlation_id    -> Aynı iş akışına ait event’leri ilişkilendiren kimlik
schema_version    -> Event schema versiyonu
processing_stage  -> Event’in pipeline içindeki aşaması

Bu alanlar sayesinde bir event’in yalnızca Kafka topic’e yazılıp yazılmadığı değil, platform içinde hangi aşamalardan geçtiği de takip edilebilir.

Örneğin bir event için farklı zaman bilgileri tutulabilir:

event_time                -> Kaynak sistemde event’in oluştuğu zaman
ingestion_time            -> Kafka’ya ilk yazıldığı zaman
validation_time           -> Validation aşamasından geçtiği zaman
enrichment_time           -> Enrichment aşamasından geçtiği zaman
curation_time             -> Curated topic’e yazıldığı zaman
downstream_delivery_time  -> Hedef sisteme ulaştığı zaman

Bu yaklaşım sayesinde end-to-end latency daha doğru ölçülebilir. Gecikmenin kaynak sistemde mi, Kafka topic’inde mi, stream processing job’ında mı, downstream sink’te mi yoksa lakehouse yazım katmanında mı oluştuğu daha net analiz edilebilir.

Pratikte monitoring tasarımı üç seviyede ele alınmalıdır:

1. Platform seviyesi
   Broker sağlığı
   Disk kullanımı
   Replication durumu
   Partition dağılımı

2. Akış seviyesi
   Topic throughput
   Consumer lag
   Failed event count
   DLQ count

3. Event lifecycle seviyesi
   End-to-end latency
   Stage latency
   Correlation ID bazlı izleme
   Event freshness

Bu seviyedeki görünürlüğü tamamen manuel script’ler, ayrı ayrı dashboard’lar veya dağınık alarm kurallarıyla yönetmek zamanla zorlaşabilir. Özellikle enterprise ortamlarda Kafka operasyonunu destekleyen gelişmiş monitoring, observability, alerting ve governance yeteneklerine sahip platformlardan yararlanmak bu yükü azaltır.

Örneğin Cloudera Kafka, Confluent Platform veya benzer enterprise Kafka dağıtımları; topic seviyesinde izleme, consumer lag takibi, cluster sağlık kontrolleri, kapasite planlama, güvenlik entegrasyonu, audit ve operasyonel alarm yönetimi gibi konularda daha merkezi bir yönetim yaklaşımı sağlayabilir.

Buradaki amaç yalnızca metrik toplamak değil, Kafka tabanlı event akışlarını operasyonel olarak yönetilebilir hale getirmektir. Çünkü EDA mimarisinde monitoring, dashboard üzerinde birkaç grafik görmekten ibaret değildir; veri akışının sağlıklı, güvenli, izlenebilir ve SLA’lara uygun ilerlediğini sürekli doğrulama mekanizmasıdır.

Ayrıca her kritik pipeline için alarm eşikleri baştan tanımlanmalıdır. Örneğin belirli bir kaynaktan beklenen sürede veri gelmiyorsa, consumer lag belirli bir eşiği aşıyorsa, DLQ sayısı normal davranışın üzerine çıkıyorsa veya end-to-end latency SLA değerini geçiyorsa otomatik alert üretilmelidir.

Bu nedenle iyi bir EDA monitoring tasarımı yalnızca “Kafka ayakta mı?” sorusuna değil, şu sorulara da cevap verebilmelidir:

Event beklenen zamanda geldi mi?
Event doğru aşamadan geçti mi?
Hangi aşamada gecikti?
Hangi consumer geride kaldı?
Hangi topic’te hata oranı arttı?
DLQ sayısı normal davranışın dışına çıktı mı?
Curated event downstream tüketiciye zamanında ulaştı mı?

Özellikle stage-based pipeline tasarımında her aşama için ayrı latency ve hata metrikleri izlenmelidir:

raw -> validated latency
validated -> enriched latency
enriched -> curated latency
curated -> downstream latency

Güvenlik ve Governance

Enterprise EDA tasarımlarında güvenlik ve governance sonradan eklenmemelidir.

Dikkat edilmesi gereken başlıklar:

Kim hangi topic’e yazabilir?
Kim hangi topic’i okuyabilir?
Hassas veri event içinde taşınıyor mu?
Masking veya tokenization gerekiyor mu?
Event lineage takip ediliyor mu?
Hangi consumer hangi veriyi tüketiyor?
Audit log tutuluyor mu?
Retention regülasyonlarla uyumlu mu?
Lakehouse tarafında tablo erişimleri topic erişimleriyle tutarlı mı?

EDA, sadece teknik bir streaming projesi değil; veri yönetişimi, güvenlik ve operasyon disiplini gerektiren bir platform yaklaşımıdır.

Kafka tarafında topic-level access control, schema governance ve audit önemliyken; lakehouse tarafında table-level policy, lineage, data catalog ve veri sınıflandırma kritik hale gelir.

EDA’ya Nereden Başlamalı?

EDA’ya başlarken ilk hedef tüm kurumu event-driven hale getirmek olmamalıdır. Daha doğru yaklaşım, net iş değeri olan bir pilot use case seçmektir.

İyi başlangıç use case’leri:

Fraud detection.
Real-time transaction monitoring.
IoT sensor monitoring.
Customer activity tracking.
Payment event streaming.
Security log analytics.
Real-time operational dashboard.

İlk fazda şu çıktılar hedeflenmelidir:

Event domain modeli.
Topic naming standardı.
Schema standardı.
Producer ve consumer contract’ları.
Raw, validated, enriched, curated yaklaşımı.
DLQ ve retry stratejisi.
Lakehouse bronze/silver/gold eşleşme prensibi.
Monitoring yaklaşımı.
Security ve access modeli.
Replay stratejisi.

Örnek Referans Mimari

Basit bir EDA + lakehouse referans mimarisi şu şekilde olabilir:

Source Systems
      |
      v
Kaynak sistem adaptörleri / Custom Producers
      |
      v
Kafka Event Channels
      |
      +--> Stream Processing
      |        |
      |        +--> Validated / Enriched / Curated Topics
      |        +--> Alert Topics
      |        +--> DLQ Topics
      |
      +--> Lakehouse Bronze / Silver / Gold
      +--> Operational Database
      +--> BI / Dashboard
      +--> Machine Learning Pipelines
      +--> Notification Services

Burada Kafka event taşıma omurgasını sağlar. Kaynak sistem adaptörleri veya custom producer uygulamaları veriyi Kafka event channel’larına taşır. Stream processing katmanı event’leri doğrulama, zenginleştirme, yönlendirme ve alert üretimi için kullanılır. Lakehouse ise kalıcı analitik veri katmanı olarak konumlanır.

Sonuç

Event Driven Architecture’da başarılı olmak için event üretmek yeterli değildir. Event’in platform içinde nasıl olgunlaştığı, nasıl doğrulandığı, nasıl zenginleştirildiği, hatalı kayıtların nasıl yönetildiği ve downstream tüketicilere nasıl güvenilir şekilde sunulduğu da en az event modelinin kendisi kadar önemlidir.

Raw, validated, enriched ve curated yaklaşımı, akan verinin platform içinde kontrollü şekilde olgunlaşmasını sağlar. Bu yapı modern lakehouse mimarilerindeki Bronze, Silver ve Gold katmanlarıyla benzer bir düşünce yapısını paylaşır. Kafka tarafında bu katmanlar event channel olarak, lakehouse tarafında ise kalıcı ve sorgulanabilir veri setleri olarak karşılık bulur.

Doğru tasarlanmış bir EDA + lakehouse mimarisi kurumlara iki önemli kabiliyeti aynı anda kazandırır: Gerçek zamanlı aksiyon ve güvenilir tarihsel analitik.

Yanlış tasarlandığında ise Kafka topic’leri, stream processing job’ları, lakehouse tabloları, DLQ’lar ve consumer’lar arasında izlenmesi zor bir karmaşa oluşur.

Bu nedenle EDA projelerine yalnızca topic açarak değil; event yaşam döngüsünü, lakehouse ilişkisini, replay stratejisini, monitoring yaklaşımını ve governance modelini birlikte tasarlayarak başlamak gerekir.

English version of this article is also available on my profile.

Part 2: Event Pipeline Design: The Real-Time Data Lifecycle from Kafka to Lakehouse

Tayfun Yalcinkaya — Thu, 04 Jun 2026 18:01:50 +0000

In the first article, we covered the core concepts of Event Driven Architecture, topic/channel design on Kafka, the difference between events and commands, schema contracts, and producer-consumer relationships. In this article, we will take one step further and look at the lifecycle of an event inside the platform.

Because in EDA design, the real challenge is not only producing events. The real challenge is making those events reliable, traceable, reprocessable, enrichable, and usable by different consumers.

In this article, we will focus on these questions:

What happens when a raw event arrives in the platform?
How is an event validated, enriched, and made consumable?
How should raw, validated, enriched, and curated topics be positioned?
How can this structure be related to the Medallion approach in modern lakehouse architectures?
When do DLQ and alert topics come into play?
How should replay, idempotency, monitoring, security, and governance be designed?

What Is an Event Pipeline?

In EDA architectures, especially in data platform projects, events usually pass through a lifecycle.

This lifecycle can be modeled like this:

raw -> validated -> enriched -> curated
                  |             |
                  v             v
                 dlq          alert

This structure allows the data flow to mature step by step.

The raw topic carries the original event from the source. The validated topic contains events that passed schema and basic quality checks. The enriched topic contains events improved with reference data or other data sources. The curated topic represents trusted, normalized, and business-ready events for consumers.

The Relationship Between Event Pipeline and Medallion Architecture

This structure has a natural similarity with the Medallion approach often used in modern lakehouse architectures.

In the lakehouse world, the Bronze layer represents raw data, the Silver layer represents cleaned and enriched data, and the Gold layer represents business-ready data products. Kafka raw, validated, enriched, and curated topics apply a similar maturity logic to data in motion.

However, there is an important difference. On the Kafka side, these stages work as event channels. On the lakehouse side, they become persistent, queryable, and governed datasets.

We can think about it like this:

Kafka raw topic       -> Lakehouse bronze table
Kafka enriched topic  -> Lakehouse silver table
Kafka curated topic   -> Lakehouse gold table

This is not a one-to-one or mandatory mapping. Not every topic must become a separate lakehouse table. But from an architectural point of view, both approaches aim to move data from a raw state to a trusted and consumable state.

A more accurate distinction is:

Data in motion: Kafka topics
Data at rest: Lakehouse tables

Kafka provides real-time flow, consumer independence, fan-out, low latency, and replay capability. The lakehouse provides persistent storage, historical analysis, SQL access, BI/ML consumption, data governance, and analytical data products.

Raw Topic

The raw topic is the first channel where the original event from the source is written.

raw.payment.transaction

At this stage, the data is stored as it comes from the source. The format may be incorrect, fields may be missing, or the message may not fully match the expected schema. The raw layer is valuable for replay, audit, and troubleshooting.

The raw topic can also be the source for the bronze layer in the lakehouse. This creates a link between the streaming event flow and the analytical raw data archive.

Validated Topic

The validated topic contains events that passed schema and basic data quality checks.

validated.payment.transaction

Typical checks at this stage include:

Are required fields available?
Does the event match the schema?
Are date, number, and currency formats correct?
Is there an event ID?
Is the event time valid?
Is duplicate control required?

A validation service consumes the raw topic and writes successful events to the validated topic. Failed events can be written to a DLQ topic.

raw.payment.transaction
        |
        v
validation-service
        |
        +--> validated.payment.transaction
        +--> dlq.payment.transaction.validation

The validated topic prevents downstream processes from working directly with raw and unreliable data.

Enriched Topic

The enriched topic contains the validated event after additional information is added.

enriched.payment.transaction

For example, the raw event may contain only customer ID and transaction details:

{
  "transaction_id": "T1001",
  "customer_id": "C789",
  "merchant_id": "M456",
  "amount": 950,
  "currency": "TRY"
}

After enrichment, the event may include customer segment, merchant category, country, risk score, or location information:

{
  "transaction_id": "T1001",
  "customer_id": "C789",
  "customer_segment": "premium",
  "merchant_id": "M456",
  "merchant_category": "electronics",
  "country": "TR",
  "amount": 950,
  "currency": "TRY",
  "risk_score": 42
}

At this stage, the consumer is no longer only moving data. It may join with other sources, perform lookups, or match reference data.

From a lakehouse perspective, enriched events are often meaningful inputs for the silver layer. The data is no longer raw. It has passed certain quality checks and has become more useful for analytics.

Curated Topic

The curated topic contains trusted, normalized, and business-ready events for consumers.

curated.payment.transaction

The curated layer is the level where most downstream consumers should connect. BI, dashboards, data lake ingestion, machine learning feature pipelines, notification services, and operational analytics usually consume curated events.

On the lakehouse side, curated events can feed gold-layer data products. For example, operational dashboards, customer segment analysis, fraud KPIs, or payment success rate metrics can be created from curated events.

Alert Topic

The alert topic is the channel where action-oriented events are published, such as alarms, fraud alerts, threshold breaches, or anomalies.

alert.payment.fraud
alert.machine.temperature
alert.network.anomaly

The alert topic does not always have to be the final step of a linear pipeline. It can be produced by a separate stream processing job consuming from validated, enriched, or curated topics.

For example:

enriched.payment.transaction
        |
        v
fraud-detection-service
        |
        +--> alert.payment.fraud

Alert topics are designed for operational reaction. For this reason, retention, consumer SLA, retry behavior, and notification integration should be considered separately.

DLQ Topic

DLQ means Dead Letter Queue. It is the channel where failed or unprocessable events are sent.

dlq.payment.transaction.validation
dlq.payment.transaction.enrichment

A DLQ event should not only contain the failed payload. It should also contain the reason for the failure.

{
  "original_topic": "raw.payment.transaction",
  "original_partition": 3,
  "original_offset": 982133,
  "error_type": "VALIDATION_ERROR",
  "error_message": "customer_id is missing",
  "failed_at": "2026-05-13T12:30:00Z",
  "payload": {
    "transaction_id": "T1001",
    "amount": 950
  }
}

This approach prevents failed records from being lost and makes later correction, investigation, or replay possible.

In EDA projects without a DLQ design, failed events either disappear silently or create continuous retries on the consumer side, slowing down the whole pipeline.

Does Every Stage Have to Be a Separate Topic?

No. Raw, validated, enriched, and curated stages do not always need to be physical Kafka topics.

There are three common approaches.

1. Stage-Based Pipeline

Each stage is modeled as a separate topic.

raw -> validated -> enriched -> curated

This model is strong for observability, replay, audit, and team responsibility separation. However, it increases the number of topics and operational complexity.

2. Single Processor, Multi-Output

A single stream processing job reads the raw topic, performs validation, enrichment, and curation, and writes only to result topics.

raw.payment.transaction
        |
        v
stream-processing-job
        |
        +--> curated.payment.transaction
        +--> alert.payment.fraud
        +--> dlq.payment.transaction

This model creates fewer topics and lower latency. However, debugging and replay may be harder because intermediate stages are not visible.

3. Branching Pipeline

Multiple consumers read from the same topic for different purposes.

validated.payment.transaction
        |
        +--> enrichment-service
        +--> audit-sink-service
        +--> fraud-service
        +--> realtime-dashboard-service

This is one of the strongest parts of EDA. The same event can be used by multiple independent consumers for different purposes.

How Kafka and Lakehouse Work Together

In a well-designed EDA architecture, Kafka and the lakehouse are not alternatives to each other. They complement each other.

Kafka is used for:

Event transport.
Low-latency processing.
Consumer fan-out.
Operational reaction.
Replay.
Loose coupling between systems.

The lakehouse is used for:

Persistent storage.
Historical analysis.
SQL analytics.
BI and dashboards.
Machine learning feature generation.
Governance, lineage, and data products.

An example flow may look like this:

raw.payment.transaction
        |
        +--> bronze.payment_transaction_raw
        |
        v
validation-service
        |
        +--> validated.payment.transaction
        +--> dlq.payment.transaction
        |
        v
enrichment-service
        |
        +--> enriched.payment.transaction
        +--> silver.payment_transaction_enriched
        |
        v
curation-service
        |
        +--> curated.payment.transaction
        +--> gold.payment_transaction_analytics
        +--> alert.payment.fraud

In this model, Kafka manages data in motion, while the lakehouse manages data at rest.

The Same Scenario from a Pipeline Perspective: From Streaming to Archive and Analytics

In the first article, we looked at the anonymized energy distribution field example from a channel-based topic design perspective. Now let’s look at the same scenario from an event pipeline perspective.

In this architecture, XML-based meter and activity messages from different sources first arrived in a secure Kafka ingress layer. Because of security and operational isolation needs, data from external sources was not taken directly into the internal platform. Instead, a controlled ingress approach was preferred. In such architectures, separating the external or semi-external data landing layer from the internal event backbone provides important benefits for security and operational management.

In the next step, the stream processing layer read messages from Kafka, parsed the XML content, interpreted the messages, and routed them to target systems. For operational processes, data was written to a relational database. For raw and historical analysis needs, a separate big data/archive layer was fed. As a result, the same event flow served both daily operational needs and historical query and analytics scenarios.

This structure is a good field example of the relationship between EDA and lakehouse thinking. On the Kafka side, flowing events provided real-time transport, separation, and consumer independence. On the archive and analytics side, raw data storage, historical querying, anomaly detection, and forecasting scenarios were supported. If we use modern lakehouse terminology, raw meter messages correspond to a bronze-like archive layer, parsed and interpreted data corresponds to a silver-like analytics layer, and operational dashboards or reporting outputs correspond to gold-like consumption layers.

Kafka raw meter topics
        |
        +--> Raw archive / bronze-like layer
        |
        v
stream processing / parsing / enrichment
        |
        +--> Operational database
        +--> Search and historical query layer
        +--> Analytics and forecasting layer
        +--> Dashboard and alert mechanisms

In this scenario, Kafka retention was also a critical part of the architecture. When a target system slowed down, became temporarily unavailable, or went under maintenance, Kafka acted as a controlled durability layer to prevent data loss. Consumers could continue reading from their last committed offset when the system became healthy again. This shows why retention and replay design in EDA is not only a technical setting, but also a business continuity decision.

Operational visibility was as important as data transport. Monitoring the number of messages received by distribution company and data type in the last seconds and hours, checking whether data arrived at the expected frequency, and producing alerts for flow interruptions made the platform not only working, but also observable.

The main lesson from this example is this: EDA is not only about systems writing events to Kafka. A real enterprise data flow architecture appears when secure data ingress, topic design, stream processing, retention, replay, operational monitoring, archiving, and analytics layers are designed together.

In such a flow, XML messages that cannot be parsed, missing meter information, unexpected format changes, or target system write errors should not stop the main pipeline. For this reason, DLQ or quarantine-style error flows should not be treated as later improvements. In large-scale architectures like this, they should be considered a natural part of the design.

Why Consumer Idempotency Matters

In Kafka-based systems, we should always assume that consumers may process the same event more than once. Duplicate processing can happen because of network issues, retries, offset commit problems, or application restarts.

For this reason, consumers should be designed to be idempotent.

This means that processing the same event twice should not change the final result.

For example, if a payment event arrives twice, the customer should not receive two refunds. To prevent this, duplicate checks should be performed using event_id, transaction_id, or a business key.

Idempotency is especially critical for these consumers:

Consumers performing financial operations.
Services sending notifications.
Sinks writing to operational databases.
Systems producing alerts.
Jobs performing upsert or merge operations on lakehouse tables.

Replay Design

One of the strengths of EDA is that events can be retained in Kafka for a certain time and read again when needed.

Replay may be needed in these situations:

A new consumer wants to process past events from the beginning.
An incorrect enrichment logic is fixed and events must be processed again.
Missing events need to be reloaded into the data lake.
A machine learning feature pipeline must be rebuilt.
Records in the DLQ are corrected and sent back to the flow.

Replay is powerful, but it also carries risk. During replay, downstream systems must not create duplicate records, and irreversible actions such as financial transactions must not be triggered again.

Replay design should answer these questions from the beginning:

Is Kafka retention long enough?
Are raw events archived in the lakehouse bronze layer?
Are consumers idempotent?
Will replay be done with a separate consumer group?
Will replayed events trigger operational actions again?
Which topics are considered safe for replay?

Monitoring and Operations

Operational visibility must be part of EDA design.

Key metrics to monitor include:

Topic throughput.
Producer error rate.
Consumer lag.
Consumer processing time.
Failed event count.
DLQ event count.
Partition skew.
Broker disk usage.
Replication status.
End-to-end latency.

It is not enough to know that the Kafka cluster is running. We must also know how long it takes for an event to move from the source to the target consumer, and where it is waiting in the pipeline.

To achieve this level of visibility, monitoring should not depend only on infrastructure metrics. The event itself should also be traceable. For this reason, standard fields should be carried either in the event payload or in the event metadata.

The following fields become critical for tracking the journey of an event inside the platform:

event_id          -> Unique identifier of the event
event_type        -> Type of the event
event_time        -> Time when the event was created in the source system
source_system     -> Source system that produced the event
correlation_id    -> Identifier used to connect events from the same business flow
schema_version    -> Version of the event schema
processing_stage  -> Current stage of the event inside the pipeline

With these fields, we can track not only whether an event was written to a Kafka topic, but also which stages it passed through inside the platform.

For example, different timestamps can be captured for the same event:

event_time                -> Time when the event was created in the source system
ingestion_time            -> Time when the event was first written to Kafka
validation_time           -> Time when the event passed the validation stage
enrichment_time           -> Time when the event passed the enrichment stage
curation_time             -> Time when the event was written to the curated topic
downstream_delivery_time  -> Time when the event reached the target system

This approach makes it possible to measure end-to-end latency more accurately. It also helps identify whether the delay happened in the source system, Kafka topic, stream processing job, downstream sink, or lakehouse write layer.

In practice, monitoring design should be handled at three levels:

1. Platform level
   Broker health
   Disk usage
   Replication status
   Partition distribution

2. Flow level
   Topic throughput
   Consumer lag
   Failed event count
   DLQ count

3. Event lifecycle level
   End-to-end latency
   Stage latency
   Correlation ID based tracking
   Event freshness

Managing this level of visibility only with manual scripts, separate dashboards, or scattered alert rules can become difficult over time. Especially in enterprise environments, platforms with advanced monitoring, observability, alerting, and governance capabilities for Kafka operations can reduce this operational burden.

For example, enterprise Kafka distributions such as Cloudera Kafka, Confluent Platform, or similar Kafka platforms can provide a more centralized approach for topic-level monitoring, consumer lag tracking, cluster health checks, capacity planning, security integration, audit, and operational alert management.

The goal is not only to collect metrics, but to make Kafka-based event flows operationally manageable. In EDA, monitoring is not just about seeing a few charts on a dashboard. It is a continuous validation mechanism that confirms whether the data flow is healthy, secure, traceable, and aligned with SLAs.

Alert thresholds should also be defined from the beginning for every critical pipeline. For example, an automatic alert should be produced when data does not arrive from a specific source within the expected time, consumer lag exceeds a defined threshold, DLQ count rises above normal behavior, or end-to-end latency violates the SLA.

For this reason, a good EDA monitoring design should answer not only the question “Is Kafka running?”, but also the following questions:

Did the event arrive at the expected time?
Did the event pass through the correct stage?
At which stage was it delayed?
Which consumer is falling behind?
Which topic has an increasing error rate?
Did the DLQ count move outside normal behavior?
Did the curated event reach the downstream consumer on time?

Especially in a stage-based pipeline, latency and error metrics should be monitored for each stage:

raw -> validated latency
validated -> enriched latency
enriched -> curated latency
curated -> downstream latency

Security and Governance

In enterprise EDA designs, security and governance should not be added later.

Important questions include:

Who can write to which topic?
Who can read from which topic?
Does the event contain sensitive data?
Is masking or tokenization required?
Is event lineage tracked?
Which consumer reads which data?
Are audit logs available?
Is retention aligned with regulations?
Are lakehouse table permissions consistent with topic permissions?

EDA is not only a technical streaming project. It is a platform approach that requires data governance, security, and operational discipline.

On the Kafka side, topic-level access control, schema governance, and audit are important. On the lakehouse side, table-level policies, lineage, data catalog, and data classification become critical.

Where Should You Start with EDA?

When starting with EDA, the first goal should not be making the whole organization event-driven. A better approach is to choose a pilot use case with clear business value.

Good starting use cases include:

Fraud detection.
Real-time transaction monitoring.
IoT sensor monitoring.
Customer activity tracking.
Payment event streaming.
Security log analytics.
Real-time operational dashboard.

The first phase should produce these outputs:

Event domain model.
Topic naming standard.
Schema standard.
Producer and consumer contracts.
Raw, validated, enriched, curated approach.
DLQ and retry strategy.
Lakehouse bronze/silver/gold mapping principles.
Monitoring approach.
Security and access model.
Replay strategy.

Example Reference Architecture

A simple EDA + lakehouse reference architecture can look like this:

Source Systems
      |
      v
Source system adapters / Custom Producers
      |
      v
Kafka Event Channels
      |
      +--> Stream Processing
      |        |
      |        +--> Validated / Enriched / Curated Topics
      |        +--> Alert Topics
      |        +--> DLQ Topics
      |
      +--> Lakehouse Bronze / Silver / Gold
      +--> Operational Database
      +--> BI / Dashboard
      +--> Machine Learning Pipelines
      +--> Notification Services

Here, Kafka provides the event transport backbone. Source system adapters or custom producer applications move data into Kafka event channels. The stream processing layer validates, enriches, routes, and produces alerts from events. The lakehouse is positioned as the persistent analytical data layer.

Conclusion of the Second Article

In Event Driven Architecture, producing events is not enough. How events mature inside the platform, how they are validated, how they are enriched, how failed records are handled, and how they are safely provided to downstream consumers are as important as the event model itself.

The raw, validated, enriched, and curated approach allows streaming data to mature in a controlled way inside the platform. This structure shares a similar way of thinking with the Bronze, Silver, and Gold layers in modern lakehouse architectures. On the Kafka side, these stages appear as event channels. On the lakehouse side, they become persistent and queryable datasets.

A well-designed EDA + lakehouse architecture gives organizations two important capabilities at the same time: real-time action and reliable historical analytics.

If it is poorly designed, Kafka topics, stream processing jobs, lakehouse tables, DLQs, and consumers can become a complex system that is hard to trace.

For this reason, EDA projects should not start only by creating topics. They should start by designing the event lifecycle, the lakehouse relationship, the replay strategy, the monitoring approach, and the governance model together.

Bölüm 1 - Kafka ile Event Driven Architecture (EDA): Event ve Channel Tasarımını Doğru Yapmak

Tayfun Yalcinkaya — Wed, 13 May 2026 16:07:42 +0000

Bu yazı, Event Driven Architecture ve Kafka tabanlı event/channel tasarımını Türkçe teknik kaynak ihtiyacını da gözeterek hazırlanmış bir yazı dizisinin ilk bölümüdür.

Enterprise kurumlar, karar alma süreçlerini saatler veya günler sonra çalışan batch analitiklere bırakmak yerine, olaylar gerçekleştiği anda tepki verebilen mimarilere yöneliyor. Fraud detection, gerçek zamanlı müşteri deneyimi, IoT izleme, ödeme sistemleri, operasyonel dashboardlar ve güvenlik analitiği gibi alanlarda Event Driven Architecture (EDA), artık yalnızca modern bir entegrasyon yaklaşımı değil; gerçek zamanlı veri platformlarının temel yapı taşlarından biri haline geliyor.

Bunun nedeni basit: Birçok kurumda kronikleşen batch analitik problemleri yalnızca teknoloji problemi değil, zamanlama problemidir. Veri önce toplanır, sonra taşınır, sonra işlenir, sonra raporlanır. Ancak iş kararı çoktan gecikmiş olabilir. EDA, doğru tasarlandığında bu gecikmeyi azaltır; veriyi bekletmeden, olay gerçekleştiği anda işleyerek daha düşük latency, daha hızlı aksiyon ve daha esnek entegrasyon modeli sağlar.

Bu noktada EDA’yı batch işlemenin doğrudan alternatifi gibi değil, onu tamamlayan farklı bir mimari refleks olarak konumlandırmak daha doğru olur. Büyük tarihsel veri işleme, dönemsel raporlama, mali kapanış veya yoğun toplu dönüşüm işleri hâlâ batch dünyasının güçlü olduğu alanlardır. EDA’nın fark yarattığı yer ise verinin sürekli aktığı, kararın gecikmeden verilmesi gerektiği ve sistemlerin olaylara anlık tepki üretmesinin beklendiği senaryolardır.

Data management platformlarında yanlış tasarlanan Event Driven Architecture yaklaşımları ise zamanla streaming akışlarını ve gerçek zamanlı analitikleri tam bir karmaşaya dönüştürebilir. Başlangıçta “Kafka topic açalım, sistemler oraya yazsın” gibi basit görünen kararlar; birkaç ay sonra kontrolsüz topic büyümesi, belirsiz event sahipliği, hatalı veri yayılımı, tüketici bağımlılıkları, tekrar işleme problemleri ve izlenemeyen veri akışları olarak geri döner.

Bu yazı dizisinin omurgasını iki soru oluşturuyor: Event’i nasıl tasarlamalıyız ve event platform içinde nasıl olgunlaşmalı? İlk yazıda event’in kendisine odaklanacağız; event-command ayrımı, Kafka topic/channel modeli, schema contract, partition key, producer-consumer ilişkisi ve sık yapılan tasarım hatalarını ele alacağız. İkinci yazıda ise event’in platform içindeki yaşam döngüsüne bakacağız; raw’dan curated event’lere uzanan pipeline’ı, DLQ ve alert topic’lerini, replay, monitoring, governance ve modern lakehouse mimarilerindeki Medallion yaklaşımıyla kurulan ilişkiyi inceleyeceğiz.

Event Driven Architecture Nedir?

Event Driven Architecture, sistemlerin birbirleriyle doğrudan ve senkron çağrılar üzerinden değil, gerçekleşen olaylar üzerinden haberleştiği bir mimari yaklaşımdır.

Bir event, sistemde gerçekleşmiş anlamlı bir iş olayını temsil eder:

Müşteri oluşturuldu.
Ödeme tamamlandı.
Kart işlemi başarısız oldu.
Stok seviyesi kritik eşiğin altına düştü.
Sensör sıcaklığı limit değerini aştı.
Kullanıcı mobil uygulamaya giriş yaptı.

Buradaki önemli nokta şudur: Event bir istek değil, gerçekleşmiş bir durum bilgisidir.

Örneğin:

PaymentCompleted
CustomerCreated
OrderShipped
MachineTemperatureExceeded

Bunlar event’tir. Çünkü geçmişte olmuş bir şeyi bildirirler.

Buna karşılık:

CreatePayment
UpdateCustomer
SendNotification

bunlar daha çok command yapısına yakındır. Yani bir sisteme bir şey yaptırma niyeti taşır. EDA tasarımında event ile command ayrımını doğru yapmak kritik önemdedir.

Klasik Entegrasyon ile EDA Arasındaki Fark

Klasik mimarilerde sistemler genellikle birbirini doğrudan çağırır.

Application A ---> Application B ---> Application C

Bu model küçük ölçekte basit görünür. Ancak sistem sayısı arttıkça bağımlılıklar büyür. Bir servisin yavaşlaması, hata alması veya değişmesi zincirdeki diğer sistemleri de etkileyebilir.

EDA’da ise sistemler doğrudan birbirine bağımlı olmak yerine olay yayınlar ve bu olaylarla ilgilenen sistemler ilgili event’i tüketir.

Application A ---> Event Channel ---> Application B
                               |----> Application C
                               |----> Application D

Bu model sayesinde üretici sistem, event’i kimin tüketeceğini bilmek zorunda kalmaz. Yeni bir tüketici eklemek için mevcut producer uygulamayı değiştirmeye gerek kalmaz.

Kafka EDA İçinde Nerede Durur?

Kafka, EDA mimarilerinde çoğunlukla event backbone, event bus veya event streaming platform olarak konumlanır.

Kafka’nın temel kavramları şunlardır:

Producer: Event üreten uygulama.
Topic: Event’lerin yazıldığı mantıksal kanal.
Partition: Topic içindeki paralel işleme birimi.
Consumer: Event okuyan uygulama.
Consumer Group: Aynı işi paylaşarak yapan consumer kümesi.
Broker: Kafka cluster içindeki sunucular.
Offset: Consumer’ın topic içinde nereye kadar okuduğunu gösteren pozisyon.
Retention: Event’lerin Kafka üzerinde ne kadar süre tutulacağını belirleyen süre veya boyut politikası.

Basit bir Kafka tabanlı EDA akışı şöyle düşünülebilir:

Source System
     |
     v
Kafka Topic / Event Channel
     |
     +--> Real-time Analytics
     +--> Notification Service
     +--> Data Lake Ingestion
     +--> Fraud Detection
     +--> Monitoring Dashboard

Burada Kafka topic’leri, sistemler arasında event taşıyan channel’lar gibi davranır.

Channel-Based Kafka Tasarımı Ne Anlama Gelir?

Channel-based yapı, genellikle event tiplerinin veya iş domain’lerinin Kafka topic’leri üzerinden ayrıştırılması anlamına gelir.

Örneğin bir ödeme sistemi için:

payment.transaction.created
payment.transaction.authorized
payment.transaction.completed
payment.transaction.failed
payment.transaction.reversed

Müşteri domain’i için:

customer.created
customer.updated
customer.segment.changed
customer.status.changed

IoT veya üretim senaryosu için:

machine.telemetry.raw
machine.telemetry.enriched
machine.alert.temperature
machine.maintenance.predicted

Burada her topic bir event channel’dır. Producer uygulama ilgili channel’a event yazar. Consumer uygulamalar ise ilgilendikleri channel’ları okuyarak kendi işlerini yapar.

Sahadan Anonimleştirilmiş Bir Örnek: Enerji Dağıtım Verilerinde Channel-Based Tasarım

Aşağıdaki örnek, enerji dağıtım alanında yürütülmüş büyük ölçekli bir veri platformu çalışmasından anonimleştirilerek aktarılmıştır. Kurum ve proje adı paylaşmadan, sahada karşılaşılan mimari ihtiyaçları ve EDA tasarım kararlarını görünür kılmayı amaçlıyor.

Bu senaryoda temel ihtiyaç, ülke genelindeki akıllı sayaç ve aydınlatma altyapısından gelen verilerin merkezi olarak toplanması, güvenli biçimde taşınması, işlenmesi ve analiz edilebilir hale getirilmesiydi. İlk bakışta bu bir entegrasyon projesi gibi görünebilir. Ancak veri hacmi, kaynak sistem sayısı, 7/24 çalışma ihtiyacı, güvenlik beklentisi ve operasyonel izleme gereksinimi düşünüldüğünde problem aslında klasik entegrasyondan çok daha fazlasıydı: gerçek zamanlı ve kesintisiz çalışan bir veri akışı mimarisi tasarlamak gerekiyordu.

Sahada karşılaşılan en kritik kararlardan biri Kafka topic tasarımıydı. Birden fazla dağıtım şirketinden farklı tiplerde sayaç verileri alınıyordu: elektrik tüketim verileri, sayaç aktivite bilgileri, çevrim içi/çevrim dışı durumları ve operasyonel sinyaller. Tüm veriyi tek bir büyük topic’e yazmak ilk bakışta daha basit görünebilirdi; ancak bu yaklaşım consumer tarafında ayrıştırma, ölçekleme, hata yönetimi ve izleme açısından ciddi karmaşa yaratacaktı.

Bu nedenle kaynak ve veri tipi bazlı channel yaklaşımı tercih edildi. Dağıtım şirketi ve veri tipi kırılımında onlarca Kafka topic’i tasarlanarak her akışın ayrı izlenebilmesi, ayrı tüketilebilmesi ve gerektiğinde bağımsız ölçeklenebilmesi sağlandı.

Örnek olarak bu mantık şu şekilde düşünülebilir:

raw.energy.meter.reading.<source>
raw.energy.meter.status.<source>
raw.energy.lighting.consumption.<source>
raw.energy.device.activity.<source>

Bu tasarımda Kafka yalnızca mesaj taşıyan bir ara katman değil, farklı kaynaklardan gelen yüksek hacimli veriyi düzenli kanallar üzerinden ayrıştıran merkezi event backbone rolünü üstlendi. Böylece gerçek zamanlı izleme servisleri, operasyonel veritabanına yazan consumer’lar, arşivleme süreçleri ve analitik platformlar aynı veri akışından bağımsız olarak beslenebildi.

Bu örneğin gösterdiği temel ders şudur: Channel-based Kafka tasarımında topic sayısının artması tek başına problem değildir. Asıl problem, topic’lerin hangi domain’e, hangi veri tipine, hangi sahipliğe ve hangi tüketim amacına hizmet ettiğinin belirsiz olmasıdır.

Event mi Command mı?

EDA tasarımında sık yapılan hatalardan biri event ile command kavramlarını karıştırmaktır.

Event, gerçekleşmiş bir iş olayını ifade eder:

PaymentCompleted
CustomerCreated
OrderShipped

Command ise bir sisteme yapılması istenen aksiyonu ifade eder:

CreatePayment
UpdateCustomer
SendNotification

Kafka üzerinde command da taşınabilir; ancak bu durumda timeout, retry, correlation, response handling ve idempotency gibi konular daha karmaşık hale gelir. Bu nedenle Kafka tabanlı EDA tasarımında mümkün olduğunca “gerçekleşmiş olayları” modellemek daha sağlıklı bir başlangıçtır.

Business Event State ile Data Pipeline Stage Karıştırılmamalı

EDA tasarımında bir diğer kritik ayrım, business state ile data processing stage arasındadır.

Business event lifecycle şuna benzer:

PaymentInitiated -> PaymentAuthorized -> PaymentCompleted -> PaymentSettled

Data pipeline stage ise şuna benzer:

raw -> validated -> enriched -> curated

İlki iş sürecinin durum değişimini anlatır. İkincisi verinin platform içinde işlenme olgunluğunu anlatır. Bu iki kavramı ayırmak, doğru topic tasarımı için çok önemlidir.

İlk yazıda daha çok business event ve channel tasarımına odaklanıyoruz. İkinci yazıda ise raw, validated, enriched ve curated gibi data pipeline aşamalarını detaylandıracağız.

Topic Tasarımında Dikkat Edilmesi Gerekenler

Kafka üzerinde EDA tasarlarken topic isimlendirme, partition stratejisi ve sahiplik modeli en kritik kararlardandır.

Örnek topic naming standardı:

<domain>.<entity>.<event>

Örnekler:

payment.transaction.completed
customer.profile.updated
machine.temperature.exceeded

Bazı kurumlar stage bilgisini de topic adına eklemeyi tercih eder:

raw.payment.transaction.created
validated.payment.transaction.created
enriched.payment.transaction.created
curated.payment.transaction.completed

Burada önemli olan tek bir doğru isimlendirme standardı değil, organizasyon genelinde tutarlı bir standardın olmasıdır.

İyi bir topic ismi şu sorulara cevap verebilmelidir:

Hangi domain’e ait?
Hangi entity veya iş nesnesini temsil ediyor?
Hangi event’i taşıyor?
Bu topic’in sahibi hangi ekip?
Bu topic kalıcı bir contract mı, yoksa geçici bir processing topic’i mi?

Partition Key Seçimi

Kafka’da partition key seçimi hem performansı hem de sıralama garantisini etkiler.

Örneğin ödeme işlemlerinde aynı müşteriye ait event’lerin sıralı işlenmesi gerekiyorsa key olarak customer_id seçilebilir.

Topic: payment.transaction
Key: customer_id

Aynı karta ait işlemler sıralı işlenmek isteniyorsa card_id daha doğru olabilir.

Topic: card.transaction
Key: card_id

Yanlış key seçimi bazı partition’ların aşırı yüklenmesine, bazı partition’ların ise boş kalmasına neden olabilir. Bu da hot partition problemine yol açar.

Partition key seçerken şu sorular sorulmalıdır:

Hangi seviyede sıralama garantisine ihtiyacımız var?
Hangi key dağılımı daha dengeli sağlar?
Consumer paralelliği nasıl ölçeklenecek?
Aynı business entity’ye ait event’ler aynı partition’da mı kalmalı?

Schema Yönetimi

EDA’da event contract çok önemlidir. Çünkü producer ve consumer doğrudan birbirini tanımasa bile schema üzerinden anlaşır.

Bu nedenle her event tipi için net bir schema yönetimi olmalıdır.

Dikkat edilmesi gerekenler:

Event schema versiyonlanmalı.
Geriye uyumluluk kuralları tanımlanmalı.
Zorunlu ve opsiyonel alanlar net olmalı.
Event timestamp, event_id, source_system gibi metadata alanları standartlaştırılmalı.
Breaking change yapılacaksa yeni versiyon veya yeni topic stratejisi belirlenmeli.

Örnek metadata alanları:

{
  "event_id": "evt-12345",
  "event_type": "PaymentCompleted",
  "event_version": "1.0",
  "event_time": "2026-05-13T10:15:00Z",
  "source_system": "payment-service",
  "correlation_id": "corr-98765"
}

Schema yönetimi ihmal edilirse Kafka topic’leri zamanla güvenilir event contract’ları olmaktan çıkar ve “kim ne yazıyor, kim nasıl okuyor” sorusunun cevabı belirsizleşir.

Producer ve Consumer İlişkisi

EDA’nın en önemli avantajlarından biri producer ve consumer arasındaki gevşek bağlılıktır.

Producer uygulama, event’i yayınlar. Bu event’i kaç consumer’ın okuyacağını bilmek zorunda değildir.

payment.transaction.completed
        |
        +--> fraud-service
        +--> notification-service
        +--> data-lake-ingestion
        +--> realtime-dashboard
        +--> audit-service

Bu model yeni kullanım senaryolarının mevcut producer uygulamayı değiştirmeden eklenmesine olanak sağlar. Ancak bu özgürlük, topic sahipliği ve schema contract net değilse hızla kontrolsüz tüketici bağımlılığına dönüşebilir.

Bu yüzden her kritik topic için şu bilgiler net olmalıdır:

Topic owner kim?
Producer uygulama hangisi?
Desteklenen schema versiyonları neler?
Kimler tüketebilir?
Retention politikası nedir?
Breaking change süreci nasıl yönetilir?

EDA’da Sık Yapılan Tasarım Hataları

EDA projelerinin başarısız olmasının nedeni genellikle Kafka’nın yetersizliği değil, mimari kararların net olmamasıdır.

Sık yapılan hatalar şunlardır:

Her ihtiyaç için kontrolsüz topic açmak.
Event ile command kavramlarını karıştırmak.
Topic sahipliğini tanımlamamak.
Schema yönetimini ihmal etmek.
Partition key’i rastgele seçmek.
Retention politikasını iş ihtiyacına göre belirlememek.
Producer ve consumer contract’larını dokümante etmemek.
Business event state ile data pipeline stage kavramlarını karıştırmak.
Monitoring, security ve governance gereksinimlerini sonradan düşünmek.

Sonuç

Event Driven Architecture, gerçek zamanlı veri akışları ve analitik ihtiyaçları için güçlü bir mimari yaklaşımdır. Ancak EDA’nın başarısı Kafka cluster kurmakla değil, doğru event modelini tasarlamakla başlar.

İyi bir EDA tasarımı için şu soruların cevabı net olmalıdır:

Hangi event’ler üretilecek?
Event ile command ayrımı nasıl yapılacak?
Topic/channel standardı nasıl olacak?
Topic sahipliği kimde olacak?
Schema nasıl yönetilecek?
Partition key nasıl seçilecek?
Producer ve consumer contract’ları nasıl korunacak?

Bu yazıda EDA’nın temelini oluşturan event modelleme, Kafka topic/channel tasarımı, schema contract ve producer-consumer ilişkisini ele aldık. Bir sonraki yazıda ise bu event’lerin platform içinde nasıl olgunlaştığını; raw, validated, enriched ve curated akışlarını, DLQ ve alert topic’lerini, replay stratejisini, monitoring’i ve modern lakehouse mimarilerindeki Medallion yaklaşımıyla ilişkisini inceleyeceğiz.

English version of this article is also available on my profile.

Part 1 - Event Driven Architecture (EDA) with Kafka: Designing Events and Channels the Right Way

Tayfun Yalcinkaya — Wed, 13 May 2026 15:53:37 +0000

Enterprise organizations around the world are moving away from relying only on batch analytics that run hours or days after an event has happened. Instead, they are adopting architectures that can react when events occur. In areas such as fraud detection, real-time customer experience, IoT monitoring, payment systems, operational dashboards, and security analytics, Event Driven Architecture is no longer only a modern integration pattern. It has become one of the key building blocks of real-time data platforms.

The reason is simple: in many organizations, chronic batch analytics problems are not only technology problems; they are timing problems. Data is collected, moved, processed, and reported. But by the time the business receives the result, the decision may already be late. When EDA is designed correctly, it reduces this delay. It allows data to be processed as events happen, providing lower latency, faster action, and a more flexible integration model.

At this point, it is more accurate to position EDA not as a direct replacement for batch processing, but as a different architectural response that complements it. Large historical data processing, periodic reporting, financial closing, and heavy bulk transformations are still strong areas for batch processing. EDA creates the most value when data is continuously flowing, decisions must be made quickly, and systems are expected to react to events almost immediately.

However, poorly designed Event Driven Architecture can turn streaming flows and real-time analytics into a serious mess inside data management platforms. At the beginning, a decision like “let’s create Kafka topics and let systems write there” may look simple. A few months later, it may turn into uncontrolled topic growth, unclear event ownership, unreliable data distribution, consumer dependency problems, reprocessing issues, and data flows that are difficult to trace.

This blog series is built around two main questions: How should we design an event, and how should an event mature inside the platform? In this first article, we will focus on the event itself: the difference between events and commands, the Kafka topic/channel model, schema contracts, partition keys, producer-consumer relationships, and common design mistakes. In the second article, we will look at the event lifecycle inside the platform: the pipeline from raw to curated events, DLQ (Dead Letter Queue : It is the channel where failed or unprocessable events are sent) and alert topics, replay, monitoring, governance, and the relationship with the Medallion approach used in modern lakehouse architectures.

What Is Event Driven Architecture?

Event Driven Architecture is an architectural approach where systems communicate through events instead of direct and synchronous calls.

An event represents a meaningful business fact that has already happened in the system:

A customer was created.
A payment was completed.
A card transaction failed.
Stock level dropped below a critical threshold.
A sensor temperature exceeded the limit.
A user logged in to a mobile application.

The important point is this: an event is not a request. It is information about something that has already happened.

For example:

PaymentCompleted
CustomerCreated
OrderShipped
MachineTemperatureExceeded

These are events because they describe something that happened in the past.

On the other hand:

CreatePayment
UpdateCustomer
SendNotification

These are closer to commands. They express an intention to make a system do something. In EDA design, separating events from commands is a critical decision.

The Difference Between Traditional Integration and EDA

In traditional architectures, systems usually call each other directly.

Application A ---> Application B ---> Application C

This model looks simple at a small scale. However, as the number of systems increases, dependencies grow. If one service becomes slow, fails, or changes, the other systems in the chain may also be affected.

In EDA, systems do not directly depend on each other. Instead, they publish events, and the systems interested in those events consume them.

Application A ---> Event Channel ---> Application B
                               |----> Application C
                               |----> Application D

With this model, the producer system does not need to know who will consume the event. A new consumer can be added without changing the existing producer application.

Where Does Kafka Fit in EDA?

In EDA architectures, Kafka is usually positioned as an event backbone, event bus, or event streaming platform.

The core Kafka concepts are:

Producer: The application that produces events.
Topic: The logical channel where events are written.
Partition: The unit that enables parallel processing within a topic.
Consumer: The application that reads events.
Consumer Group: A group of consumers that share the same work.
Broker: A server inside the Kafka cluster.
Offset: The position that shows how far a consumer has read in a topic.
Retention: The policy that defines how long events stay in Kafka.

A simple Kafka-based EDA flow can be shown like this:

Source System
     |
     v
Kafka Topic / Event Channel
     |
     +--> Real-time Analytics
     +--> Notification Service
     +--> Data Lake Ingestion
     +--> Fraud Detection
     +--> Monitoring Dashboard

Here, Kafka topics act as event channels between systems.

What Does Channel-Based Kafka Design Mean?

A channel-based structure usually means separating event types or business domains through Kafka topics.

For example, for a payment system:

payment.transaction.created
payment.transaction.authorized
payment.transaction.completed
payment.transaction.failed
payment.transaction.reversed

For a customer domain:

customer.created
customer.updated
customer.segment.changed
customer.status.changed

For an IoT or manufacturing scenario:

machine.telemetry.raw
machine.telemetry.enriched
machine.alert.temperature
machine.maintenance.predicted

Each topic is an event channel. The producer application writes events to the related channel. Consumer applications read the channels they are interested in and perform their own work.

An Anonymized Field Example: Channel-Based Design for Energy Distribution Data

The following example is anonymized from a large-scale data platform project in the energy distribution domain. Without sharing the organization or project name, it aims to make the real architectural needs and EDA design decisions from the field more visible.

In this scenario, the main need was to centrally collect, securely transport, process, and analyze data coming from smart meters and lighting infrastructure across the country. At first glance, this may look like an integration project. However, when we consider the data volume, the number of source systems, the need for 24/7 operation, security expectations, and operational monitoring requirements, the problem becomes much bigger than classical integration. A real-time and always-on data flow architecture was needed.

One of the most critical decisions in the field was Kafka topic design. Different types of meter data were coming from multiple distribution companies: electricity consumption data, meter activity data, online/offline status information, and operational signals. Writing all data into one large topic could look simpler at first. But this would create serious complexity for consumers in terms of filtering, scaling, error handling, and monitoring.

For this reason, a source-based and data-type-based channel approach was preferred. Dozens of Kafka topics were designed based on distribution company and data type. This allowed each flow to be monitored separately, consumed separately, and scaled independently when needed.

This logic can be represented as follows:

raw.energy.meter.reading.<source>
raw.energy.meter.status.<source>
raw.energy.lighting.consumption.<source>
raw.energy.device.activity.<source>

In this design, Kafka was not only a messaging layer. It became the central event backbone that separated high-volume data from different sources into organized channels. As a result, real-time monitoring services, consumers writing to operational databases, archiving processes, and analytics platforms could all be fed independently from the same data flow.

The main lesson from this example is clear: in channel-based Kafka design, a growing number of topics is not a problem by itself. The real problem starts when it is not clear which domain, data type, ownership model, and consumption purpose each topic serves.

Event or Command?

One of the common mistakes in EDA design is mixing events and commands.

An event represents a business fact that has already happened:

PaymentCompleted
CustomerCreated
OrderShipped

A command represents an action that we want a system to perform:

CreatePayment
UpdateCustomer
SendNotification

Commands can also be carried over Kafka. However, this makes topics such as timeout, retry, correlation, response handling, and idempotency more complex. For this reason, in Kafka-based EDA design, it is usually healthier to start by modeling things that have already happened.

Business Event State and Data Pipeline Stage Should Not Be Mixed

Another important distinction in EDA design is the difference between business state and data processing stage.

A business event lifecycle may look like this:

PaymentInitiated -> PaymentAuthorized -> PaymentCompleted -> PaymentSettled

A data pipeline stage may look like this:

raw -> validated -> enriched -> curated

The first one describes the state changes of a business process. The second one describes the maturity of data inside the platform. Separating these two concepts is very important for correct topic design.

In the first article, we focus more on business events and channel design. In the second article, we will explain data pipeline stages such as raw, validated, enriched, and curated in more detail.

What to Consider in Topic Design

When designing EDA on Kafka, topic naming, partition strategy, and ownership are among the most critical decisions.

An example topic naming standard can be:

<domain>.<entity>.<event>

Examples:

payment.transaction.completed
customer.profile.updated
machine.temperature.exceeded

Some organizations also prefer to include the stage information in the topic name:

raw.payment.transaction.created
validated.payment.transaction.created
enriched.payment.transaction.created
curated.payment.transaction.completed

The important point is not that there is only one correct naming standard. The important point is having a consistent standard across the organization.

A good topic name should answer these questions:

Which domain does it belong to?
Which entity or business object does it represent?
Which event does it carry?
Which team owns this topic?
Is this topic a stable contract, or is it a temporary processing topic?

Partition Key Selection

In Kafka, partition key selection affects both performance and ordering guarantees.

For example, if events related to the same customer must be processed in order, customer_id can be selected as the key.

Topic: payment.transaction
Key: customer_id

If transactions related to the same card must be processed in order, card_id may be a better choice.

Topic: card.transaction
Key: card_id

A poor key choice can overload some partitions while leaving others almost empty. This causes a hot partition problem.

When selecting a partition key, these questions should be asked:

What level of ordering do we need?
Which key provides a more balanced distribution?
How will consumer parallelism scale?
Should events related to the same business entity stay in the same partition?

Schema Management

In EDA, the event contract is very important. Even if producers and consumers do not know each other directly, they agree through the schema.

For this reason, each event type should have a clear schema management approach.

Key points to consider:

Event schemas should be versioned.
Backward compatibility rules should be defined.
Required and optional fields should be clear.
Metadata fields such as event timestamp, event_id, and source_system should be standardized.
If a breaking change is needed, a new version or a new topic strategy should be defined.

Example metadata fields:

{
  "event_id": "evt-12345",
  "event_type": "PaymentCompleted",
  "event_version": "1.0",
  "event_time": "2026-05-13T10:15:00Z",
  "source_system": "payment-service",
  "correlation_id": "corr-98765"
}

If schema management is ignored, Kafka topics slowly stop being reliable event contracts. Then the answer to “who writes what and who reads it how?” becomes unclear.

Producer and Consumer Relationship

One of the most important advantages of EDA is loose coupling between producers and consumers.

The producer application publishes the event. It does not need to know how many consumers will read it.

payment.transaction.completed
        |
        +--> fraud-service
        +--> notification-service
        +--> data-lake-ingestion
        +--> realtime-dashboard
        +--> audit-service

This model allows new use cases to be added without changing the existing producer application. However, this flexibility can quickly turn into uncontrolled consumer dependency if topic ownership and schema contracts are not clear.

For every critical topic, the following information should be clear:

Who owns the topic?
Which application is the producer?
Which schema versions are supported?
Who is allowed to consume it?
What is the retention policy?
How are breaking changes managed?

Common EDA Design Mistakes

EDA projects usually fail not because Kafka is weak, but because architectural decisions are not clear enough.

Common mistakes include:

Creating topics for every need without control.
Mixing events and commands.
Not defining topic ownership.
Ignoring schema management.
Choosing partition keys randomly.
Defining retention without business requirements.
Not documenting producer and consumer contracts.
Mixing business event state with data pipeline stages.
Thinking about monitoring, security, and governance too late.

Conclusion

Event Driven Architecture is a powerful approach for real-time data flows and analytics needs. However, the success of EDA does not start with installing a Kafka cluster. It starts with designing the right event model.

A good EDA design should clearly answer these questions:

Which events will be produced?
How will we separate events from commands?
What will the topic/channel standard be?
Who will own each topic?
How will schemas be managed?
How will partition keys be selected?
How will producer and consumer contracts be protected?

In this article, we covered event modeling, Kafka topic/channel design, schema contracts, and producer-consumer relationships. In the next article, we will look at how these events mature inside the platform: raw, validated, enriched, and curated flows, DLQ and alert topics, replay strategy, monitoring, and the relationship with the Medallion approach used in modern lakehouse architectures.

Turkish version of this article is also available on my profile.

Why Apache Ozone is the Preferred Object Store for Big Data

Tayfun Yalcinkaya — Mon, 05 Jan 2026 21:42:00 +0000

The limitations of traditional HDFS architecture when facing billions of small files, combined with the search for S3-like flexibility in on-premise environments, drive us toward a modern solution: Apache Ozone.

From a technology perspective, the abundance of products and methods available for data storage requires serious expertise to navigate. If you need to store a wide variety of data, standard RDBMS technologies will eventually fall short. You need to turn to independent, cost-effective, yet efficient storage technologies that allow you to query data performantly regardless of its type.

The Shift to On-Premise Object Storage

If your data landscape includes structured, semi-structured, and unstructured data, and you aim for cost efficiency by avoiding separate silos, all paths lead to an object storage architecture, implemented through an on-premise object store. For organizations with requirements to keep data in-house, on-premise solutions are a necessity.

Unlike traditional object storage systems that prioritize API compatibility, Apache Ozone is designed as a storage system optimized for analytical engines rather than object semantics alone.

While the market offers several options like MinIO or Ceph , if you are utilizing big data engines such as Hive, Spark, Trino, or Impala, there is a particularly optimized solution: Apache Ozone.

(You can explore the technical architecture of Apache Ozone here).

Key Technical Advantages of Apache Ozone

Source: Cloudera Ozone Overview Documentation

Strong Consistency:
Ozone is designed to provide strong consistency via the Raft consensus protocol. This ensures that data is immediately visible once written, with guaranteed atomic write support. In contrast, S3-compatible interfaces in other systems may exhibit eventual consistency, leading to potential delays or conflicts during overwrite or list operations.
Native Ecosystem Integration:
Unlike basic S3-compatible stores that offer limited integration with tools like Hive and Impala, Ozone is built as a core part of the Hadoop ecosystem. This results in seamless, out-of-the-box support for major big data processing engines Hive, Spark, and Trino.For instance, you can check the detailed Hive Integration Documentation to see the level of optimization.
POSIX Compatibility & File System Behavior:
Through its OFS layer, Ozone offers POSIX-like behavior and a directory hierarchy. This allows for native atomic renames, which are crucial for the performance and reliability of Hadoop-based workloads.
Full Kerberos Support:
Leveraging its native Hadoop compatibility, Ozone offers full integration with Kerberos for enterprise-grade security , a feature often lacking in S3-only object stores.

Feature	Apache Ozone	S3 (MinIO, Ceph, etc.)
Performance	Optimized for large-scale data lakes	High throughput, limited metadata handling
Consistency Model	Strong Consistency (Raft-based)	Eventual Consistency (possible delays)
Hadoop/Spark/Trino	Native & Seamless Integration	Limited (especially for Hive/Impala)
POSIX / File System	POSIX-like (Native Atomic Rename)	None (Object-based only)
Kerberos Support	Fully Compatible (Native)	None

The Perfect Match for Modern Data Lakehouse (Apache Iceberg)
If you are moving toward a Data Lakehouse architecture using Apache Iceberg, Ozone stands out as the superior storage layer:

Atomic Commits:
Iceberg relies on atomic metadata updates to prevent data corruption during concurrent writes. Ozone supports this natively through its atomic rename functionality.
Native Locking:
It supports the locking mechanisms necessary to prevent metadata inconsistencies , whereas S3-compatible stores often require external services like Zookeeper to manage locks.
Snapshot Isolation:
Ozone’s architecture ensures that data is not considered committed until acknowledged by all replicas, preserving the consistent view that Iceberg’s immutable file model requires.

Feature	Apache Ozone	S3-compatible Object Stores
Atomic Commits	Fully Supported (via OFS)	No native support (workarounds required)
Locking Mechanism	Native Support	Requires external tools (Zookeeper, etc.)
Snapshot Isolation	Guaranteed (Strong Consistency)	Very limited / Eventual consistency
Directory Structure	Native Support	Simulated (Prefix-based)

Conclusion
For organizations aiming to process unstructured and structured data effectively using Spark, Hive, or Trino. Apache Ozone is not just an alternative. It is a purpose-built on-premise object store for big data workloads. It bridges the gap between traditional file systems and modern object storage, making it the ideal choice for high-performance data lakehouse architectures.

What is your preferred storage layer for on-premise big data projects? How could Ozone’s advantages resolve bottlenecks in your current architecture?

Written by Tayfun Yalçınkaya, working on large-scale Big Data platforms and Lakehouse architectures.
Connect with me on LinkedIn