Martin Tuncaydin

Posted on May 4

The Modern Travel Data Stack in 2025: How Leading OTAs Architect Their Warehouse Layer

#dataarchitecture #traveltechnology #datawarehouse #ota

The Modern Travel Data Stack in 2025: How I'm Seeing Leading OTAs Architect Their Warehouse Layer

The travel industry has always been data-intensive, but the sheer volume and velocity of information we're managing in 2025 has fundamentally changed how I think about data infrastructure. After years of working with online travel agencies at various scales, I've watched the modern data stack evolve from a buzzword into a genuine architectural paradigm—one that's reshaping how we build, maintain and derive value from travel data warehouses.

Why the Traditional ETL Approach No Longer Works for Travel Data

I remember the days when building a travel data warehouse meant procuring expensive enterprise software, hiring specialised consultants, and waiting months for the first query to run. The traditional extract-transform-load pattern made sense in an era of batch processing and overnight data refreshes, but today's travellers expect real-time personalisation, dynamic pricing updates, and instant booking confirmations.

The fundamental problem I've observed is that legacy ETL tools were designed for a world where data moved slowly and transformation logic lived in opaque, difficult-to-test black boxes. When you're managing pricing feeds from hundreds of suppliers, tracking user behaviour across mobile apps and web properties, and reconciling bookings across multiple payment gateways, you need transparency and speed that traditional tools simply cannot provide.

What I've seen work consistently well is the ELT pattern—extract, load, then transform—where raw data lands in the warehouse first and transformations happen using the warehouse's computational power. This approach has become the foundation of what I consider the modern travel data stack.

The Core Components I'm Seeing in Production

The architecture I encounter most frequently among forward-thinking travel technology teams centres around three key layers: ingestion, storage, and transformation. Each layer has seen remarkable innovation in the past few years, and the integration between these components has become remarkably seamless.

Ingestion: Getting Data from Everywhere

Travel data comes from an overwhelming variety of sources. I'm talking about GDS feeds, supplier APIs, payment processors, customer service platforms, marketing automation tools, mobile analytics, and countless SaaS applications. The challenge isn't just volume—it's the sheer heterogeneity of formats, update frequencies, and reliability guarantees.

I've watched Airbyte emerge as a genuine game-changer in this space. What impresses me most isn't just the breadth of pre-built connectors—though having ready-made integrations for everything from Salesforce to Stripe certainly helps—it's the open-source foundation that allows teams to build custom connectors when needed. In travel, you invariably encounter proprietary supplier feeds or legacy systems that require bespoke integration work.

The shift toward declarative, configuration-driven ingestion has been profound. Instead of writing and maintaining thousands of lines of Python or Java to move data around, I'm seeing teams define their pipelines in YAML, version control the configurations, and let the ingestion platform handle the heavy lifting of incremental updates, error handling, and schema evolution.

Storage: The Warehouse as the Single Source of Truth

I've become convinced that the choice of data warehouse is one of the most consequential decisions a travel technology team can make. The warehouse isn't just a place to store data—it's the computational engine that powers analytics, the foundation for machine learning pipelines, and increasingly, the operational database that serves customer-facing applications.

Snowflake has become ubiquitous in the travel industry, and for good reason. The separation of storage and compute means I can run heavy transformation jobs without impacting the analysts querying for yesterday's booking metrics. The ability to spin up virtual warehouses on demand, size them appropriately for the workload, and shut them down when finished has fundamentally changed the economics of data warehousing.

What really matters in travel, though, is the ability to handle semi-structured data elegantly. Flight search results, hotel availability responses, and user clickstream events all arrive as JSON, and trying to force everything into rigid relational schemas creates more problems than it solves. I've seen teams maintain their agility by landing JSON directly in the warehouse and using SQL to parse it on-read, deferring schema decisions until the data's actual use case is clear.

The time-travel and zero-copy cloning features have become essential for my work. Being able to query historical states of tables is invaluable when investigating booking discrepancies or understanding how pricing logic evolved. Creating instant copies of production data for testing transformation changes has accelerated development cycles dramatically.

Transformation: Where dbt Changed Everything

If I had to point to a single tool that's transformed how travel data teams work, it would be dbt. The shift from imperative scripts to declarative SQL-based transformations has been nothing short of revolutionary in my experience.

The dbt Philosophy in Practice

What I love about dbt is that it treats analytics code like software engineering. Every model is a SELECT statement, version-controlled in Git, with dependencies explicitly declared. The DAG of transformations is automatically inferred, so I don't waste time managing execution order or worrying about circular dependencies.

In a travel context, I'm usually building staging models that clean and standardise raw data, intermediate models that implement business logic, and mart models that serve specific analytical use cases. A typical project might have staging models for raw bookings, cancellations, and modifications; intermediate models that calculate net revenue and apply refund logic; and mart models for executive dashboards, revenue operations, and finance reporting.

The testing framework is where dbt really shines for travel data quality. I can assert that booking amounts are always positive, that every transaction has a valid user ID, that currency codes conform to ISO standards, and that date ranges make logical sense. These tests run on every transformation, catching data quality issues before they propagate downstream.

Documentation generation has solved a persistent problem I've faced throughout my career—keeping data dictionaries current. With dbt, I write descriptions alongside the code, and the documentation site is automatically generated and stays synchronised with the actual models. When a new analyst joins the team and asks what "gross_booking_value" means, I can point them to living documentation rather than a stale wiki page.

Incremental Models and Travel Data Volumes

Travel datasets grow quickly. I'm routinely working with billions of search events, hundreds of millions of bookings, and terabytes of supplier availability data. Running full-refresh transformations on tables of that scale is neither practical nor necessary.

dbt's incremental materialisation strategy has become essential in my work. I can define logic that processes only new or changed records, appending them to existing tables or updating specific rows based on a unique key. For a bookings table, this might mean processing only bookings created or modified since the last run. For a user behaviour table, it might mean appending yesterday's clickstream events.

The balance I've learned to strike is between incremental efficiency and the need for occasional full refreshes. I typically run incremental models daily but schedule full refreshes weekly or monthly to catch any edge cases and ensure long-term data consistency.

Orchestration: Bringing It All Together

The modern data stack isn't just about individual tools—it's about how they work together. I've seen teams struggle when they treat ingestion, transformation, and analysis as separate concerns with different scheduling systems and monitoring tools.

The orchestration layer I'm most commonly seeing is a combination of Airbyte's scheduling for data ingestion and dbt Cloud for transformation runs, with everything monitored through a unified observability platform. Some teams have adopted Airflow for more complex workflows, especially when machine learning pipelines or operational data pushes are involved.

What matters most in my experience is having clear dependency management and intelligent retry logic. If a supplier API fails at three in the morning, I want the system to retry with exponential backoff, alert the on-call engineer if it continues failing, and gracefully skip downstream transformations that depend on that data without blocking unrelated work.

The Emerging Patterns I'm Tracking

As I look at how the modern travel data stack is evolving, several trends stand out to me as particularly significant.

The Shift Toward Real-Time

Batch processing isn't going away, but I'm seeing increasing demand for real-time or near-real-time data flows. Travellers expect to see booking confirmations instantly, and revenue teams want to monitor conversion rates as campaigns launch, not the next morning.

The tools are adapting. Airbyte now supports CDC-based replication for databases, Snowflake has introduced dynamic tables that continuously refresh as upstream data changes, and dbt is exploring incremental models that can run more frequently with micro-batch processing.

Reverse ETL and Operational Analytics

One of the most interesting developments I've witnessed is the rise of reverse ETL—taking data from the warehouse and pushing it back into operational systems. Instead of the warehouse being purely an analytical endpoint, it's becoming the source of truth that feeds personalisation engines, marketing automation platforms, and customer service tools.

I'm seeing travel teams build audience segments in their warehouse using dbt, then sync those segments to email marketing platforms, advertising networks, and CRM systems. This "warehouse-first" approach means business logic lives in one place, versioned and tested, rather than duplicated across multiple SaaS tools.

The Metrics Layer

Defining business metrics consistently has always been a challenge in my work. Different teams calculate "revenue" differently, apply varying filters, and produce reports that don't reconcile. The emergence of semantic layers and metrics stores is addressing this head-on.

Tools like dbt's metrics functionality allow me to define business metrics once—how to calculate them, what filters to apply, what dimensions they can be sliced by—and expose those definitions to downstream tools. When an analyst queries "total bookings" in a BI tool and an executive sees "total bookings" in a dashboard, I can be confident they're seeing the same number calculated the same way.

What I've Learned About Implementation

Having helped multiple travel technology teams adopt the modern data stack, I've developed strong opinions about what works and what doesn't.

Start with foundations. It's tempting to immediately build fancy dashboards and machine learning models, but if your raw data isn't reliably landing in the warehouse with good quality, everything built on top will be fragile. I always recommend getting ingestion solid first, then building a clean staging layer, before moving to business logic.

Invest in data quality early. The cost of bad data compounds over time. I've seen teams spend weeks tracking down why revenue numbers didn't match because a single transformation made an incorrect assumption about null handling. Building comprehensive tests from day one saves enormous pain later.

Documentation is not optional. I cannot stress this enough—if you don't document as you build, you'll never catch up later. The person who knows why that particular JOIN exists or what business rule that CASE statement implements will eventually leave, and without documentation, their knowledge leaves with them.

Empower analysts with engineering practices. The modern data stack has blurred the lines between data analysts and data engineers. I've seen analysts become far more effective when they adopt software engineering practices—Git for version control, pull requests for code review, CI/CD for automated testing. The tools support this workflow; organisations need to embrace it.

My View on Where This Is Heading

Looking forward, I believe the modern travel data stack will continue evolving toward greater real-time capabilities, tighter integration between analytical and operational systems, and more sophisticated approaches to data quality and governance.

The fundamental architecture—ELT with cloud warehouses as the computational core—feels durable to me. But the specific implementations will keep improving. I expect to see better support for streaming data, more powerful incremental processing strategies, and richer semantic layers that make business logic truly reusable across the organisation.

What excites me most is how these tools are democratising sophisticated data capabilities. A small travel startup can now build data infrastructure that would have required a team of dozens just five years ago. The barrier to entry has dropped dramatically, and that's creating a more competitive, innovative industry.

The travel businesses that will thrive in the coming years are those that treat data as a strategic asset, invest in modern infrastructure, and empower their teams with the right tools and practices. The modern data stack isn't just a technical choice—it's a competitive advantage that compounds over time.

About Martin Tuncaydin

Martin Tuncaydin is an AI and Data executive in the travel industry, with deep expertise spanning machine learning, data engineering, and the application of emerging AI technologies across travel platforms. Follow Martin Tuncaydin for more insights on data architecture, travel technology.

DEV Community