DEV Community

Sumedh Bala
Sumedh Bala

Posted on

Hotel Search System Design

1. Introduction

Key Technical Topics Covered (Interview Highlights)

  • Hybrid OLAP/OLTP architecture (Sections 1 & 2): Elasticsearch for multi-dimensional search + Redis for real-time availability (and why neither alone is sufficient).
  • Transactional Outbox + CDC (Section 2.2): Keeping Elasticsearch in sync with the primary DB without dual writes.
  • Elasticsearch modeling (Sections 3 & 4): Denormalized hotel/room documents, geo fields, amenity facets, script scoring, and index aliasing for zero-downtime reindex.
  • Redis availability design (Section 6): Per-night bitmaps / sorted sets, multi-night intersections, TTL/eviction strategies, and consistency with bookings.
  • Query pipeline (Section 7): Coordinated ES + Redis flow with fallbacks when availability or index data diverges.
  • Advanced filtering & ranking (Section 5): Amenity faceting, price buckets, popularity boosts, personalization hooks.
  • Scaling patterns (Sections 2 & 7): Multi-region clusters, hot/warm node tiers, cache layers, snapshot/alias-based reindexing. (Cross-region replication—e.g., keeping Redis regional while asynchronously syncing aggregates or using multi-region Elasticsearch—follows the same patterns but is out of scope for this doc.)

Prerequisite Reading (Shared Components)

This hotel search design builds on the same platform described in @sumedhbala’s ticketing series:

From those posts we inherit the core stack: PostgreSQL + transactional outbox, Debezium/Kafka pipelines, Elasticsearch clusters, Redis caches, and Stripe-style payment flows. This document focuses on how those building blocks are reconfigured for hotel-specific search, availability, and ranking concerns.

Hotel Search Challenges

Hotel search and booking introduce complexities that differ sharply from event search. Two fundamental differences from event search drive the architectural complexity:

1. Availability Integration in Search:

  • Real-time Availability Validation: Availability must be checked during the search phase, not after results are displayed
  • Date Range Queries: Users search for stays spanning multiple nights (check-in to check-out), requiring multi-date availability validation
  • Recurring Availability: Same rooms available daily, unlike one-time events
  • Real-time Updates: Availability changes constantly as bookings are made, requiring sub-millisecond response times

2. Extensive Filter Requirements:

  • Multiple Amenity Filters: Users expect to filter by numerous amenities (pool, spa, beachfront, gym, restaurant, concierge, parking, pet-friendly, etc.)
  • Room Type Complexity: Multiple room categories (Standard, Deluxe, Suite) with different capacities and room-level amenities
  • Complex Filter Combinations: Location, amenities, price range, room type, guest count, and date combinations all working together

Additional challenges include:

  • Fungible Inventory: Room 101 and Room 102 of the same type are interchangeable
  • Dynamic Pricing: Prices vary by date, season, demand, and length of stay

Key Differences from Event Search

Aspect Event Search Hotel Search
Inventory Model Fixed seats, non-fungible Room types, fungible inventory
Availability One-time events Recurring daily availability
Date Handling Single event date Date ranges (check-in to check-out)
Pricing Dynamic per seat/section (demand-based) Dynamic per room type and date (seasonal, demand-based)
Search Complexity Simple (venue, date, category) Complex (location, dates, room type, multiple amenity filters: pool, spa, beachfront, gym, etc.)
Availability in Search Checked after displaying events Must be part of search criteria

Hybrid Architecture Overview: Mandatory OLAP/OLTP Separation

Terminology refresher: OLAP (Online Analytical Processing) handles read-heavy, multi-dimensional queries (e.g., Elasticsearch), while OLTP (Online Transaction Processing) powers write-heavy, transactional workloads (e.g., Redis + primary DB). The rest of this document uses these terms frequently.

Hotel search systems face two fundamentally contradictory workloads that cannot be solved by a single system:

1. OLAP Workload (Analytical Search):

  • Complex, read-intensive, multi-dimensional queries
  • Example: "Find all 4-star hotels in New York with pool and free breakfast for 2 adults and 2 children from 2024-10-26 to 2024-10-30, sorted by review score"
  • Optimized for: High-speed reads of large datasets, complex filtering, full-text search
  • Technology: Elasticsearch (built on Apache Lucene, designed for OLAP)

2. OLTP Workload (Transactional Availability):

  • High-frequency, simple, isolated transactions
  • Example: "Decrement room count for Hotel 123, room type 'King', on date 2024-10-26"
  • Optimized for: High-throughput, low-latency updates, absolute data integrity
  • Technology: Redis (in-memory database designed for OLTP)

The Architectural Mandate:

The dual-system architecture (Elasticsearch + Redis) is not optional—it is mandatory because:

  • OLTP optimizes for write efficiency (small, fast, atomic updates)
  • OLAP optimizes for read efficiency (complex, large-scale, multi-dimensional queries)
  • These optimization goals are directly contradictory

Attempting to use Elasticsearch for OLTP availability updates would fail catastrophically due to its "update penalty" (see Section 3.2). Conversely, using a relational database for OLAP search queries lacks the ability to efficiently perform complex, multi-dimensional, and full-text search.

Key Takeaway: The combination of OLAP (Elasticsearch) and OLTP (Redis) systems is not merely the "best" solution—it is the only viable one for a business requiring both robust transaction processing and powerful data analysis.

2. Baseline Functional Solution

Services

  • Search API Service – Handles user queries, coordinates Elasticsearch and Redis operations, returns ranked results with availability
  • Availability Service – Manages Redis availability data, handles booking updates, ensures data consistency
  • Hotel Management Service – Source of truth for hotel metadata (writes to primary database)

Note: Data synchronization from the primary database to Elasticsearch is handled by an event-driven pipeline (see Section 2.2: Data Synchronization Architecture)

Data Stores

  • Elasticsearch Cluster – Search index for hotels, room types, amenities, location data (includes built-in query caching)
  • Redis Cluster (or AWS ElastiCache) – Real-time availability data exclusively
  • Primary Database (PostgreSQL/MySQL) – Hotel metadata, room configurations, pricing rules

Note: Redis is used exclusively for availability data. Search result caching is handled by Elasticsearch's built-in query cache or application-level caching mechanisms. (This is explained in detail in Section 6.)

High-Level Architecture

The system follows a three-tier approach:

  1. Search Tier: Elasticsearch handles complex queries (location, amenities, text search)
  2. Availability Tier: Redis provides real-time availability validation
  3. Data Tier: Primary database maintains canonical hotel information

Soundbite for interviews: "Hotel search requires a hybrid architecture because Elasticsearch excels at complex filtering but struggles with real-time availability updates, while Redis provides sub-millisecond availability checks but lacks sophisticated search capabilities."

Data Synchronization Architecture: Transactional Outbox + Change Data Capture (CDC)

At a glance, data flows DB → Outbox → CDC/Debezium → Kafka → Elasticsearch. The subsections below break this down so readers can skim the summary and dive deeper only if needed.

The Dual-Write Anti-Pattern Problem:

A naive approach would use an "Indexing Service" that attempts to write to both the database and Elasticsearch synchronously:

Write A: Update PostgreSQL (hotel name change)
Write B: Update Elasticsearch (hotel name change)
Enter fullscreen mode Exit fullscreen mode

This is a well-known anti-pattern because it cannot guarantee atomic execution across two independent systems. The risk of data inconsistency is not a risk—it is a guarantee over time and at scale.

Failure Scenarios:

  1. Partial Failure (Stale Search Index): Database update succeeds, Elasticsearch update fails → Search index permanently stale
  2. Partial Failure (Phantom Data): Elasticsearch update succeeds, database update fails → Hotel searchable but doesn't exist in system of record
  3. Race Conditions: Concurrent updates arrive out of order → Permanent data corruption

The Solution: Transactional Outbox Pattern + CDC

The robust solution decouples writes using an asynchronous, event-driven architecture:

1. Transactional Outbox Pattern:

  • Create an Outbox_Events table in the primary PostgreSQL database
  • When updating hotel data, perform one atomic database transaction:
  BEGIN TRANSACTION;
  -- Business data write
  UPDATE Hotels SET name = 'New Name' WHERE id = 123;
  -- Event write (atomic within same transaction)
  INSERT INTO Outbox_Events (payload) VALUES ('{"id": 123, "name": "New Name"}');
  COMMIT;
Enter fullscreen mode Exit fullscreen mode
  • This guarantees atomicity: if UPDATE fails, INSERT is rolled back; if INSERT fails, UPDATE is rolled back

2. Change Data Capture (CDC):

  • Use a log-based CDC tool (e.g., Debezium) that reads PostgreSQL's Write-Ahead Log (WAL)
  • Debezium sees the committed event in Outbox_Events and publishes it to Apache Kafka
  • This is non-intrusive and low-overhead (reads transaction log, doesn't query database)

3. Event-Driven Pipeline:

PostgreSQL (System of Record)
  ↓ (WAL)
Debezium (CDC Source Connector)
  ↓ (Kafka message)
Apache Kafka (Event Bus - provides resilience and buffering)
  ↓ (Kafka message)
Kafka Connect (Elasticsearch Sink Connector)
  ↓ (index operation)
Elasticsearch (Search Index)
Enter fullscreen mode Exit fullscreen mode

Key Benefits:

  • Data Integrity: Only 100% committed data is ever published
  • Resilience: If Elasticsearch is down, events buffer in Kafka with no data loss
  • Decoupling: Database and Elasticsearch are fully independent
  • Scalability: Each component can scale independently
  • Eventual Consistency: Tunable lag (typically seconds to low minutes) is the correct trade-off for high-performance distributed systems

Configuration Requirements:

  • PostgreSQL: Enable logical replication (wal_level = logical)
  • Debezium: Configure to read only from Outbox_Events table
  • Kafka: Long retention period (e.g., 7 days) for disaster recovery
  • Elasticsearch Sink: Configure for idempotency (pk.mode: record_key) and delete handling (behavior.on.null.values: delete)
  • Dead Letter Queue (DLQ): Route failed messages to DLQ to prevent pipeline halting

Key Takeaway: The Transactional Outbox + CDC pattern eliminates the dual-write anti-pattern, providing a production-grade, resilient architecture for data synchronization that guarantees data integrity and system availability.

3. Elasticsearch Document Structure

Availability Data Structure Options

Quick overview: We evaluate four availability strategies:

  1. Keep all availability in the primary database + Redis (Elasticsearch remains metadata-only).
  2. Embed detailed availability inside Elasticsearch documents.
  3. Hybrid flag in Elasticsearch plus full detail in Redis.
  4. Maintain a separate availability-focused Elasticsearch index. Use this summary to choose which option details to read.

Option 1: Availability Only in Primary Database and Redis

  • Elasticsearch: No availability data stored
  • Redis: Real-time availability for all date ranges
  • Primary Database: Source of truth for availability

Pros:

  • Single Source of Truth: No data synchronization complexity
  • Always Accurate: No stale availability data in Elasticsearch
  • Simpler Architecture: Fewer moving parts, less to maintain
  • Real-time Guarantees: Redis provides sub-millisecond updates

Cons:

  • Post-Filtering Required: Must filter Elasticsearch results with Redis availability check
  • Two-Step Process: Elasticsearch search → Redis availability check
  • Slightly Higher Latency: Additional Redis lookup after search

Option 2: Nested Availability in Elasticsearch

  • Availability data embedded in room type documents
  • Good for simple queries, limited scalability for date ranges

Pros:

  • Filtering in Search Phase: Can filter by availability during Elasticsearch query
  • Single Query: All filtering happens in one place

Cons:

  • Data Synchronization: Must keep Elasticsearch in sync with DB
  • Stale Data Risk: Availability in Elasticsearch may be outdated
  • Update Complexity: Every booking requires Elasticsearch update
  • Index Size: Large index with date-based availability fields

Option 3: Hybrid Approach

  • Elasticsearch: Summary availability flag (e.g., "has_availability_next_30_days: true")
  • Redis: Detailed real-time availability for specific dates
  • Primary Database: Source of truth

Pros:

  • Initial Filtering: Elasticsearch can filter out hotels with no availability
  • Real-time Accuracy: Redis provides precise availability for date ranges
  • Reduced Load: Fewer hotels passed to Redis for availability check
  • Best of Both Worlds: Search performance + real-time accuracy

Cons:

  • Dual Maintenance: Must update both Elasticsearch summary and Redis details
  • Complexity: More components to manage

Option 4: Separate Availability Index in Elasticsearch

  • Dedicated index for availability data
  • Better for complex date range queries, requires joins

Pros:

  • Complex Queries: Can handle sophisticated date range queries
  • Separation of Concerns: Availability data separate from hotel metadata

Cons:

  • Join Overhead: Requires joining availability index with hotel index
  • Synchronization Complexity: Multiple indexes to keep in sync
  • Performance: Joins are slower than single-index queries

Recommendation: Option 1 (Redis/DB Only) is Recommended

Why Option 1 is the Default Choice:

  • Availability changes frequently - Every booking updates availability in real-time
  • Redis is already optimized - Sub-millisecond lookups, perfect for real-time data
  • Simplicity wins - No sync complexity, no stale data risk
  • Performance is acceptable - Parallel execution (Elasticsearch + Redis) provides good performance
  • Single source of truth - Database is the authority, Redis is a performance cache (see Section 6 for database authority details)

Why Elasticsearch Cannot Handle OLTP Availability Updates: The "Update Penalty"

The Fundamental Problem: Elasticsearch's Immutable Segment Architecture

Elasticsearch is built on Apache Lucene, which stores data in immutable segments on disk. This design has a critical consequence: Elasticsearch does not support in-place updates or deletes.

How "Updates" Actually Work in Elasticsearch:

When a document is "updated" (e.g., changing num_available_rooms from 10 to 9), Elasticsearch performs two operations:

  1. Soft Delete: Marks the old document (with 10 rooms) as "deleted"
  2. Re-index: Indexes a brand new document (with 9 rooms) into a new, small segment

This means a simple transactional counter change becomes a full document re-index, which is CPU-intensive and requires the entire document to be processed and analyzed again.

The Compounding Cost: Segment Merging

High-frequency updates create thousands of tiny segments and massive numbers of soft-deleted documents. This is catastrophic for search performance because:

  • Elasticsearch must open and query every tiny segment
  • CPU cycles are wasted filtering out soft-deleted documents
  • To combat this, Lucene runs a background segment merging process

Segment Merging Overhead:

  • Reads small segments
  • Filters out soft-deleted documents
  • Merges remaining "live" documents into new, larger segments
  • Extremely resource-intensive: Consumes substantial CPU, disk I/O, and memory

The "Update Penalty" Impact:

If Elasticsearch were used for the OLTP availability workload:

  • High-frequency updates (thousands per minute) would trigger constant, aggressive segment merging
  • Intense background CPU and I/O activity would compete directly with primary OLAP search queries
  • Result: Increased query latency and system instability

This is why using Elasticsearch for frequently updated, mutable data is a well-known architectural anti-pattern. The OLTP availability workload must be handled by a system designed for high-throughput, low-latency updates—Redis.

When to Consider Option 3 (Hybrid with Summary) - Edge Cases Only:

  • Very high query volume - When you need to reduce Redis load by pre-filtering (rare)
  • Large candidate sets - When Elasticsearch returns 10,000+ hotels and you want to reduce Redis checks (unusual)
  • Geographic filtering - When initial geographic filter yields too many results (can be handled with better Elasticsearch queries)

Note: For most hotel search systems, Option 1 (Redis/DB Only) is sufficient and preferred due to its simplicity and real-time accuracy.

Architecture Pattern (Option 1 - Recommended):

  1. Elasticsearch: Filters by location, amenities, room types, price (no availability)
  2. Redis: Validates availability for filtered hotels (date range check)
  3. Result: Only hotels with availability are returned

Performance Impact:

  • Elasticsearch search: 25ms (filters by location, amenities, etc.)
  • Redis availability check: 10ms (checks availability for filtered hotels)
  • Total: 35ms with parallel execution

Key Takeaway: For most hotel search systems, keeping availability only in Redis and the primary database is the recommended approach. It provides simplicity, real-time accuracy, and acceptable performance. Only use Elasticsearch for availability if you have very high query volumes and need to reduce Redis load through pre-filtering.

Hotel Document Schema

Hotel documents in Elasticsearch use nested structures to represent the complex relationships between hotels and their room types:

Recommended Approach (Option 1 - Redis/DB Only):

Hotel Document:
├── Basic Information (hotel_id, name, location, rating)
├── Amenities (pool, spa, gym, restaurant)
├── Location Data (geo_point, address, city, country)
└── Room Types (nested array)
    ├── Room Type 1 (room_type_id: "deluxe_001", type: "deluxe", capacity: 2, base_price: 299, amenities: ["wifi", "tv"])
    ├── Room Type 2 (room_type_id: "suite_garden_001", type: "suite", view: "garden", capacity: 4, base_price: 599, amenities: ["wifi", "tv", "balcony"])
    └── Room Type 3 (room_type_id: "suite_ocean_001", type: "suite", view: "ocean", capacity: 4, base_price: 799, amenities: ["wifi", "tv", "balcony", "ocean_view"])
Enter fullscreen mode Exit fullscreen mode

Note: Room types can have variations (e.g., "suite with garden view" vs "suite with ocean view"). Each variation has a unique room_type_id used for availability tracking in Redis. (See "Important: Room Type ID in All Keys" in Section 6 for detailed explanation.)

Note: No availability data in Elasticsearch. Availability is checked in Redis after Elasticsearch filtering.

Alternative Approach (Option 3 - Hybrid with Summary):

Hotel Document:
├── Basic Information (hotel_id, name, location, rating)
├── Amenities (pool, spa, gym, restaurant)
├── Location Data (geo_point, address, city, country)
└── Room Types (nested array)
    ├── Room Type 1 (room_type_id: "deluxe_001", type: "deluxe", capacity: 2, base_price: 299, amenities: ["wifi", "tv"])
    ├── Room Type 2 (room_type_id: "suite_garden_001", type: "suite", view: "garden", capacity: 4, base_price: 599, amenities: ["wifi", "tv", "balcony"])
    ├── Room Type 3 (room_type_id: "suite_ocean_001", type: "suite", view: "ocean", capacity: 4, base_price: 799, amenities: ["wifi", "tv", "balcony", "ocean_view"])
    └── Availability Summary
        ├── has_availability_next_30_days: true/false
        ├── price_range: {min: 199, max: 599}
        └── last_updated: timestamp
Enter fullscreen mode Exit fullscreen mode

What is Availability Summary? (Only for Option 3 - Hybrid Approach)

If using the hybrid approach, the Availability Summary contains aggregated availability information used for initial filtering:

Fields:

  • has_availability_next_30_days: Boolean flag indicating if hotel has any availability in the next 30 days

    • Purpose: Quickly filter out hotels with no availability
    • Example: true means hotel has at least one room available in next 30 days
    • Updated: Periodically (e.g., every 15 minutes or when significant availability changes)
  • price_range: Minimum and maximum prices across all available rooms

    • Purpose: Pre-filter by price range before detailed Redis check
    • Example: {min: 199, max: 599} means cheapest room is $199, most expensive is $599
    • Updated: When room prices change significantly
  • last_updated: Timestamp of when summary was last refreshed

    • Purpose: Track data freshness, implement cache invalidation
    • Example: 2024-06-15T10:30:00Z

Important Notes:

  • Not Real-Time: Summary is updated periodically, not on every booking
  • Coarse Filtering: Used to eliminate hotels with no availability, not for final availability check
  • Redis Still Required: Detailed availability check still happens in Redis
  • Trade-off: Summary reduces Redis load but adds sync complexity

When Availability Summary is Updated:

  • Periodic refresh (every 15-30 minutes)
  • When availability drops to zero (no rooms available)
  • When availability becomes available after being zero
  • Manual refresh triggered by significant booking events

Understanding Nested Structures (Beginner-Friendly)

What is a Nested Structure?

Imagine you have a hotel document. Without nested structures, you might store room types like this:

Hotel: "Luxury Hotel"
Room Types: ["deluxe", "suite", "standard"]
Room Prices: [299, 599, 199]
Enter fullscreen mode Exit fullscreen mode

Problem: If you search for hotels with "deluxe" rooms under $300, Elasticsearch might match this hotel incorrectly. Why? Because it sees "deluxe" (✓) and "$199" (✓) in the arrays, but doesn't know that "$199" belongs to "standard", not "deluxe". It treats all array items as independent.

Solution - Nested Structures:

With nested structures, each room type (including variations) is stored as a separate "mini-document" inside the hotel document:

Hotel: "Luxury Hotel"
Room Types (nested):
  - Room Type 1: {room_type_id: "deluxe_001", type: "deluxe", price: 299, capacity: 2}
  - Room Type 2: {room_type_id: "suite_garden_001", type: "suite", view: "garden", price: 599, capacity: 4}
  - Room Type 3: {room_type_id: "suite_ocean_001", type: "suite", view: "ocean", price: 799, capacity: 4}
Enter fullscreen mode Exit fullscreen mode

Now when you search for "deluxe rooms under $300", Elasticsearch checks each room type as a complete unit. It finds "deluxe" with price "$299" as a matched pair, which is correct.

Room Type Variations:
The same room type category (e.g., "suite") can have multiple variations with different amenities or views:

  • "suite with garden view" (room_type_id: "suite_garden_001")
  • "suite with ocean view" (room_type_id: "suite_ocean_001")

Each variation has its own unique room_type_id used for availability tracking in Redis, allowing separate availability management for each variation. (See Section 6 for how room_type_id is used in Redis keys.)

Real-World Analogy:
Think of a hotel document as a filing cabinet. Without nesting, all room information is in one big drawer mixed together. With nesting, each room type has its own folder in the drawer, keeping related information together.

How Does It Help?

  1. Accurate Filtering: When filtering by "deluxe rooms under $300", you get hotels that actually have deluxe rooms at that price, not hotels that have a deluxe room AND a cheaper room (but unrelated).

  2. Independent Queries: You can ask "show me hotels where the suite has ocean view but the standard room doesn't" - nested structures allow you to query each room type independently.

  3. Better Performance: Elasticsearch can optimize queries better because it knows which fields belong together, reducing false matches.

Why Nested Model Over Parent-Child: Optimizing for OLAP Reads

Elasticsearch offers two options for modeling one-to-many relationships (hotel → room types): Nested and Parent-Child.

Nested Model (Selected):

  • Hotel and room types stored as a single document
  • Parent (hotel) and children (rooms) are co-located in the same Lucene block
  • Pros: Significantly faster queries, low memory overhead
  • Cons: High update cost—updating any field forces re-indexing of the entire document (hotel + all rooms)

Parent-Child Model (Rejected):

  • Hotel (parent) and rooms (children) indexed as separate documents
  • Pros: Low update cost—updating one child only re-indexes that child
  • Cons: Slower queries due to join overhead, higher memory overhead (requires in-memory "join list")

The Decision: OLTP Workload Removed = No Compromise Needed

The Nested vs Parent-Child debate is a microcosm of the entire OLAP/OLTP problem:

  • Nested model optimizes for reads (OLAP)
  • Parent-Child model compromises on read performance to gain update efficiency (a step towards OLTP)

Because the high-frequency OLTP workload (availability) has been completely removed and placed in Redis, the system is no longer forced to compromise. The correct choice is to optimize the search index for its primary, read-heavy workload. Therefore, the Nested model is used.

Rationale:

  • Infrequent updates to static hotel data (handled by asynchronous CDC pipeline) will incur the higher re-indexing cost
  • This is a worthwhile trade-off for maximizing query performance for all users
  • The update penalty only affects static data changes (hotel name, amenities), not high-frequency availability updates

Index Refresh Configuration:

  • For this OLAP index, index.refresh_interval can be set to a high value (e.g., 30s or 60s) or disabled during bulk loads
  • A high-frequency OLTP workload would require a low interval (e.g., default 1s), triggering constant, costly refreshes
  • With OLTP removed, the index can be tuned for maximum indexing performance and stability

Key Takeaway: By moving the OLTP availability workload to Redis, Elasticsearch can be fully optimized for OLAP reads using the Nested model, without compromising on update performance for the primary search workload.

Field Mapping Strategy

What is Field Mapping?

Field mapping tells Elasticsearch how to store and index each field in your documents. Different field types enable different query capabilities and performance characteristics. Choosing the right field type is crucial for search performance and accuracy.

Field Type Breakdown:

  • hotel_id: keyword (exact matching, aggregations)

    • Why: Hotel IDs are unique identifiers that need exact matching only
    • Use Case: "Find hotel with ID 'hotel_123'" or "Count hotels by ID"
    • Performance: Very fast exact lookups, no text analysis overhead
  • name: text with keyword subfield (full-text search + exact matching)

    • Why: Hotel names need both fuzzy text search AND exact matching
    • text field: Handles typos, partial matches, relevance scoring (e.g., "Luxury" matches "Luxurious")
    • keyword subfield: Exact matching for autocomplete, sorting, or filtering
    • Use Case: Text search finds "Luxury Downtown Hotel" even if user types "Luxry Downtown", while keyword enables exact filtering
  • location: geo_point (geospatial queries, distance calculations)

    • Why: Geographic coordinates need special handling for distance queries
    • Use Case: "Find hotels within 5km of coordinates [34.0522, -118.2437]"
    • Performance: Optimized spatial indexing for fast distance calculations
  • amenities: keyword array (filtering, faceted search)

    • Why: Amenities are standardized discrete values (e.g., ["pool", "spa", "gym"]) that need exact matching
    • Use Case: Filter by "pool AND spa" - must match exact amenity values, not partial text matches
    • Why not text:
    • Text analysis would tokenize values, breaking exact matching needed for filtering
    • Amenities are categorical data (not free-form text), so they should be matched exactly
    • Example: If "pool table" appears in hotel description, text type might match "pool" filter incorrectly in the wrong field (pool table = billiards, not swimming pool)
    • Performance: Fast exact matching for multiple amenities simultaneously
  • room_types: nested object (complex room type queries)

    • Why: Room types have their own attributes (price, capacity, amenities) that need independent filtering
    • Use Case: "Find hotels with deluxe rooms under $300" - must check room type AND price together
    • Performance: Enables filtering within nested documents without false matches
  • rating: float (range queries, sorting)

    • Why: Ratings are numeric values that need range queries and sorting
    • Use Case: "Find hotels with rating >= 4.0" or "Sort by rating descending"
    • Performance: Optimized for numeric comparisons and sorting operations

Right field type = Accurate results + Fast queries

Key Takeaway: Each field type is optimized for specific query patterns. Text fields handle fuzzy matching, keyword fields handle exact matching, geo_point handles distance calculations, and nested objects handle complex relationships. Choosing the right type ensures both query accuracy and performance.

4. Elasticsearch Index Design

Index Mapping Configuration

The hotel index uses a carefully designed mapping to optimize for different query patterns:

  • Text Fields: Standard analyzer for full-text search with stemming and stop words
    • Stemming: Reduces words to their root form (e.g., "swimming" → "swim", "pools" → "pool") so that variations of the same word match
    • Stop Words: Removes common words like "the", "and", "is" that don't add search value
  • Keyword Fields: Exact matching for filters and aggregations
  • Geo Fields: Geo-point mapping for distance-based queries
  • Nested Fields: Specialized mapping for parent-child relationships

Nested Document Configuration

From an index design perspective, nested documents provide:

  • Multi-level Filtering: Enables filtering by both hotel amenities (pool, spa) and room-level amenities (wifi, minibar) simultaneously
  • Performance: Nested queries are optimized for parent-child relationships with efficient BitSet operations
  • Flexibility: Easy to add new room types or room attributes without schema changes

Key Takeaway: The index design balances search performance with query flexibility, using appropriate field types for different use cases. For the recommended approach (Option 1), availability data is not stored in Elasticsearch - it's managed entirely in Redis and the primary database for real-time accuracy.

5. Query Patterns and Optimization

Filter-First Execution Strategy

Elasticsearch optimizes query performance by executing filters before expensive scoring operations:

Execution Order:

  1. Filter Context: Execute all filters, create BitSets (fast, cached)
  2. Query Context: Execute text search, create BitSets (fast, cached)
  3. BitSet Intersection: Combine filters with AND operations (very fast)
  4. Scoring Phase: Apply BM25 scoring only to filtered results (expensive but small set)

Performance Impact:

  • Without Filter-First: Score 50,000 documents, then filter to 1,000
  • With Filter-First: Filter to 1,000 documents, then score only those
  • Performance Improvement: 25-50x faster execution

Example: Search for "Luxury hotels with pool and spa in Los Angeles"

Query Components:

  • Text search: "Luxury" (query context - needs scoring)
  • Location filter: "Los Angeles" (filter context - no scoring)
  • Amenity filter: "pool AND spa" (filter context - no scoring)

Without Filter-First (Inefficient):

  1. Text search finds 50,000 hotels with "luxury" in name/description
  2. BM25 scoring calculated for all 50,000 hotels (expensive!)
  3. Field boosting applied to all 50,000 hotels
  4. Results ranked: Hotel_A (score 0.95), Hotel_B (score 0.92), ...
  5. Location filter applied: 50,000 → 2,000 hotels in Los Angeles
  6. Amenity filter applied: 2,000 → 800 hotels with pool AND spa
  7. Total cost: 50,000 scoring operations + ranking + filtering

With Filter-First (Optimized):

  1. Location filter: Create BitSet A of hotels in "Los Angeles" → 2,000 hotels (fast, cached)
  2. Amenity filter: Create BitSet B of hotels with "pool AND spa" → 5,000 hotels (fast, cached)
  3. Text search: Create BitSet C of hotels containing "luxury" → 50,000 hotels (fast - from inverted index)
  4. BitSet intersection: A ∩ B ∩ C = 600 hotels (very fast - bitwise AND operation)
  5. BM25 scoring: Calculate scores for only 600 hotels (expensive but small set)
  6. Field boosting: Apply to only 600 hotels
  7. Results ranked: Final 600 hotels with scores
  8. Total cost: 600 scoring operations (25-50x fewer than without filter-first)

Key Insight:

  • Text search BitSet creation: Still processes ALL documents (50,000) from inverted index - this is fast
  • Filter BitSet creation: Processes ALL documents (fast, uses doc values)
  • BitSet intersection: Very fast - just bitwise AND operations
  • Scoring optimization: Only scores the 600 hotels in the intersection, not all 50,000
  • The magic: BitSet operations are fast even on large sets, but scoring is expensive - so we minimize scoring by intersecting BitSets first

Why This Works:

  • Filters are fast: BitSet operations are O(1) per document
  • Scoring is expensive: BM25 calculation is O(term_frequency) per document
  • Reduce scoring set: Filter first to minimize expensive operations Interview Takeaway: BitSet Caching and Precomputation
  • Caching Strategy: BitSets are cached for repeated queries (85-90% cache hit rate for popular filters)
  • Precomputation Opportunity: Amenity-based BitSets can be precomputed since amenities don't change frequently
    • Common filters like "pool", "spa", "gym" can be precomputed and stored in memory
    • Only updated when hotel amenities are added/removed (rare event)
    • Provides instant filter execution without recalculating BitSets on every query
    • Key Insight: Identify filters that change infrequently (amenities, location) vs. frequently (availability, price) - precompute the stable ones

Real-World Impact:

  • Query latency: 200ms → 30ms (6.7x faster)
  • CPU usage: 50,000 scoring operations → 600 operations (83x reduction)
  • Memory: Cached BitSets reused for popular queries (85-90% cache hit rate)

Text Search with Multi-Match

Multi-match queries handle complex text search across multiple fields:

  • Field Boosting: Hotel name gets higher weight than description
  • Query Types: best_fields, most_fields, cross_fields for different matching strategies
  • Fuzzy Matching: Automatic typo tolerance and synonym expansion
  • Phrase Matching: Exact phrase matching with proximity scoring

Nested Queries for Room Type Filtering

Nested queries enable complex filtering within room type documents:

  • Room Type Filtering: Find hotels with specific room types
  • Capacity Filtering: Filter by guest count requirements
  • Amenity Filtering: Room-level amenities (wifi, minibar, ocean_view)
  • Price Range Filtering: Filter by room type pricing

How Filtering Works (Inverted Index vs Filtering)

Understanding the Difference:

Text Search (Uses Inverted Index):

  • Inverted Index: Maps each word → list of documents containing that word
  • Example: "deluxe" → [Hotel_A, Hotel_B, Hotel_C]
  • Used for: Finding documents that contain specific words
  • Fast for: "Find hotels with 'deluxe' in the name"

Filtering (Uses Doc Values):

  • Doc Values: Columnar storage format - stores all values for a field together
  • Example: All room prices stored in one column: [299, 599, 199, 450, ...]
  • Used for: Exact matching, range queries, aggregations
  • Fast for: "Find hotels with deluxe rooms under $300"

Why Different Data Structures?

Inverted indexes are optimized for text search (finding which documents contain terms), but they're inefficient for filtering operations like numeric comparisons or exact matches. Doc values store data in a columnar format that's optimized for filtering, sorting, and aggregations.

Comparison to Columnar Databases:
Elasticsearch's doc values use a similar storage strategy to columnar databases (e.g., Amazon Redshift, Google BigQuery, ClickHouse):

  • Columnar Storage: All values for a field are stored together in a column (e.g., all prices: [299, 599, 199, 450])
  • Optimized for Analytics: Fast filtering, sorting, and aggregations on numeric/categorical fields
  • Key Difference: Columnar databases store ALL data in columnar format, while Elasticsearch uses a hybrid approach:
    • Inverted indexes (row-oriented) for text search
    • Doc values (columnar) for filtering/aggregations
    • This hybrid approach gives Elasticsearch both fast text search AND fast filtering in one system

Example: "Deluxe Rooms Under $300" Query

Let's trace through how Elasticsearch handles this query:

Step 1: Understanding the Data Structure

Hotel Document:
  hotel_id: "hotel_123"
  name: "Luxury Hotel"
  room_types (nested):
    - {type: "deluxe", price: 299, capacity: 2}
    - {type: "suite", price: 599, capacity: 4}
    - {type: "standard", price: 199, capacity: 2}
Enter fullscreen mode Exit fullscreen mode

Pricing Reality (Interview Tip): In production the price stored in Elasticsearch is usually a summary (min_price_next_30_days, max_price, or a bucketed price range). This lets Elasticsearch filter aggressively without forcing constant reindexing. The precise per-date price (and availability) comes from Redis/the pricing service during the availability step, ensuring tomorrow’s rate (e.g., $400) is accurate even if Elasticsearch still lists $300 as the minimum. The rule of thumb: use Elasticsearch for coarse filtering, Redis for rapidly-changing truth.

Step 2: Nested Query Execution (Using BitSet Operations)

  1. Access Nested Documents: Elasticsearch treats each nested room type as a separate "mini-document"

    • Room Type 1: {type: "deluxe", price: 299}
    • Room Type 2: {type: "suite", price: 599}
    • Room Type 3: {type: "standard", price: 199}
  2. Filter by Type (BitSet Creation): Using doc values, create BitSet of nested documents with type = "deluxe"

    • Room Type 1: type = "deluxe" ✓ → BitSet bit 1 = true
    • Room Type 2: type = "suite" ✗ → BitSet bit 2 = false
    • Room Type 3: type = "standard" ✗ → BitSet bit 3 = false
    • Result: BitSet A = 1, 0, 0
  3. Filter by Price (BitSet Creation): Using doc values, create BitSet of nested documents with price < 300

    • Room Type 1: price = 299 < 300 ✓ → BitSet bit 1 = true
    • Room Type 2: price = 599 < 300 ✗ → BitSet bit 2 = false
    • Room Type 3: price = 199 < 300 ✓ → BitSet bit 3 = true
    • Result: BitSet B = 1, 0, 1
  4. BitSet Intersection: A ∩ B = [1, 0, 0] ∩ [1, 0, 1] = [1, 0, 0]

    • Only Room Type 1 matches both conditions (bitwise AND operation)
  5. Return Parent Document: If any nested document matches, the parent hotel document is included

    • Result: Hotel_123 is returned

Key Insight:

  • Doc Values: Used to create BitSets for filtering operations (fast, cached)
  • BitSet Operations: Filtering by type and price uses BitSet intersection (bitwise AND)
  • Nested Documents: Each room type is stored separately, allowing independent BitSet creation
  • Performance: BitSet operations are O(1) per document, much faster than scoring all documents
  • Combination: The query can use both - text search for hotel names AND BitSet filtering for room types

Performance Comparison:

Operation Data Structure Speed
Text search ("deluxe" in name) Inverted Index Very Fast
Filter (price < 300) Doc Values Very Fast
Filter (type = "deluxe") Doc Values Very Fast
Nested filter (deluxe AND price < 300) Doc Values + Nested Fast

Why This Matters:

  • Inverted indexes are great for "find documents containing words"
  • Doc values are great for "find documents matching criteria"
  • Nested structures allow filtering on related data (room types) independently
  • Combining both gives you powerful search + accurate filtering

Query Execution Order and Performance

Optimal Query Structure (Option 1 - Recommended):

  1. Geographic Filters: city, radius (most selective)
  2. Amenity Filters: pool, spa, gym (moderately selective)
  3. Room Type Filters: nested queries (selective)
  4. Text Search: multi-match queries (expensive)
  5. Scoring: BM25 calculation (expensive but small set)

Note: Date range validation is handled by Redis after Elasticsearch filtering, not during the Elasticsearch query phase. This aligns with Option 1 (Redis/DB Only) where availability is not stored in Elasticsearch.

Key Takeaway: Filter-first execution with BitSet caching provides 25-50x performance improvement by reducing expensive scoring operations to only the documents that pass all filters. Detailed BitSet mechanics and caching strategies are explained in the "Filter-First Execution Strategy" section above.

6. Redis Availability Architecture

Architecture Note: Redis (or AWS ElastiCache) is used exclusively for availability data. Search result caching and other data caching are handled by Elasticsearch's built-in caching mechanisms or application-level caching.

Redis Data Structures for Availability

Important: Room Type ID in All Keys

All Redis availability keys include room_type_id because:

  • Hotels have multiple room types and variations (e.g., "suite with garden view" vs "suite with ocean view") with separate availability
  • Each room type variation has a unique room_type_id (e.g., "suite_garden_001", "suite_ocean_001")
  • Elasticsearch filters by room type and returns room_type_id, which is used to build Redis keys
  • This allows separate availability tracking for each variation

Redis uses multiple data structures optimized for availability patterns:

Simple Key-Value Pattern:

  • Key Format: availability:hotel_id:room_type_id:check_in:check_out
  • Value: "1" (at least one room available) or number of available rooms (e.g., "5" for 5 rooms available) or null (unavailable)
  • Use Case: Simple availability checks (not recommended due to update complexity)
  • Performance: O(1) lookup time

How the Key Gets Created:

  1. Booking Event: When a booking is made or cancelled for a specific room type variation
  2. Room Type ID from Elasticsearch: Elasticsearch nested query filters hotels by room type and returns room_type_id (e.g., "suite_garden_001", "suite_ocean_001")
  3. Key Generation: Combine hotel_id, room_type_id, check_in date, and check_out date
    • Example: availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17
  4. Availability Check: Query database for availability of that room type variation across all dates in range
  5. Key Creation: If all dates are available for that room type variation, set key with value "1" (at least one available) or the number of available rooms (e.g., "5" for 5 rooms)
    • Redis command: SET availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17 "1" EX 14400 (4 hour TTL) - for at least one room
    • Redis command: SET availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17 "5" EX 14400 (4 hour TTL) - for 5 available rooms
  6. Key Deletion: If any date becomes unavailable for that room type variation, delete the key
    • Redis command: DEL availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17
  7. Update Trigger: Keys are created/updated/deleted when:
    • Bookings are confirmed for specific room type variations
    • Bookings are cancelled for specific room type variations
    • New availability is added for specific room type variations
    • Periodic sync from database

Challenge: Finding Which Keys to Modify - Combinatorial Explosion

The Problem:
When a booking is made for a specific date and room type variation (e.g., 2024-06-16, suite with ocean view), you need to invalidate all keys that include that date for that room type variation.

Combinatorial Explosion Example:
For a 30-day booking window, there are (30 × 31) / 2 = 465 possible check-in/check-out combinations. A single booking on one day (e.g., June 16th) would require finding and invalidating every single one of these 465 keys that contains "June 16th". This creates a combinatorial explosion of keys that must be tracked and updated.

Example Keys to Invalidate:

  • availability:hotel_123:suite_ocean_001:2024-06-15:2024-06-17 (includes 2024-06-16)
  • availability:hotel_123:suite_ocean_001:2024-06-16:2024-06-18 (includes 2024-06-16)
  • availability:hotel_123:suite_ocean_001:2024-06-14:2024-06-17 (includes 2024-06-16)
  • ... and 462 more keys for a 30-day window

But Redis doesn't provide an efficient way to find all keys matching a date pattern.

Solution Options:

Option 1: Pattern Matching (Inefficient)

  • Use KEYS availability:hotel_123:* to find all keys for a hotel
  • Filter keys that include the affected date
  • Delete matching keys
  • Problem: KEYS command is O(N) and blocks Redis, not suitable for production

Option 2: Secondary Index (Recommended)

  • Maintain a separate index tracking which keys exist for each date and room type variation
  • Use a hash: availability_index:hotel_123:suite_ocean_001:2024-06-16 → Set of all keys containing this date for this room type variation
  • When booking affects 2024-06-16 for suite with ocean view:
    1. Get all keys from index: SMEMBERS availability_index:hotel_123:suite_ocean_001:2024-06-16
    2. Delete all those keys: DEL key1 key2 key3...
    3. Remove index entry: DEL availability_index:hotel_123:suite_ocean_001:2024-06-16
  • When creating a key: Add key to index for each date in range and room type variation
  • Performance: O(M) where M = number of keys containing that date for that room type variation

Option 3: Hash Structure (Better Alternative) - Solves Combinatorial Explosion

  • Use hash structure instead of simple keys (see Hash Structure section)
  • Key format: availability:hotel_id:room_type_id
  • Hash fields: Individual dates (2024-06-15, 2024-06-16, etc.)
  • Solves the Problem: Instead of 465 keys for a 30-day window, you have only 30 hash fields (one per day)
  • When booking affects 2024-06-16 for suite with garden view:
    • Simply update/delete the hash field: HDEL availability:hotel_123:suite_garden_001 2024-06-16
    • No need to find which keys to modify - directly update the specific date field
  • Performance: O(1) - directly access the date field for specific room type variation
  • Scalability: Linear growth (30 fields for 30 days) vs. quadratic growth (465 keys for 30 days)

Recommendation:
For simple availability checks, Hash Structure is preferred because:

  • No need to track which keys to invalidate
  • Direct date-based access (O(1))
  • Easier to update individual dates
  • More efficient for date range queries

Hash Structure:

  • Key Format: availability:hotel_id:room_type_id
  • Hash Fields: Individual dates representing nights (2024-06-15, 2024-06-16, 2024-06-17, etc.)
    • Each date field represents availability for that night
    • Example: Field "2024-06-15" = availability for the night of June 15th
  • Hash Values: JSON object with room_count and last_updated (price comes from Elasticsearch/database, not Redis)
  • Use Case: Detailed availability per room type variation
  • Performance: O(1) field access

Examples:

  • availability:hotel_123:deluxe_001 (for deluxe rooms)
  • availability:hotel_123:suite_garden_001 (for suite with garden view)
  • availability:hotel_123:suite_ocean_001 (for suite with ocean view)

Advantage: Direct Date-Based Updates
Unlike the simple key-value pattern, hash structure allows direct updates to specific dates without needing to find which keys to modify. When a booking affects 2024-06-16 for a specific room type variation (e.g., suite with garden view), you directly update that hash field - no pattern matching or secondary indexes needed.

How the Hash Gets Created:

  1. Initial Creation: When hotel availability data is loaded into Redis for a specific room type variation
  2. Hash Key: Create hash with key availability:hotel_id:room_type_id
    • Example: availability:hotel_123:suite_garden_001 (for suite with garden view)
    • Example: availability:hotel_123:suite_ocean_001 (for suite with ocean view)
  3. Hash Field Population: For each date with availability, add a field:
    • Redis command: HSET availability:hotel_123:suite_garden_001 2024-06-15 '{"room_count":5,"last_updated":"2024-06-15T10:30:00Z"}'
    • Redis command: HSET availability:hotel_123:suite_garden_001 2024-06-16 '{"room_count":3,"last_updated":"2024-06-15T10:30:00Z"}'
  4. Field Updates: When availability changes for that room type variation:
    • Update specific date field: HSET availability:hotel_123:suite_garden_001 2024-06-15 '{"room_count":4,"last_updated":"2024-06-15T14:20:00Z"}'
  5. Field Deletion: When date becomes unavailable for that room type variation:
    • Remove field: HDEL availability:hotel_123:suite_garden_001 2024-06-17
  6. TTL: Set expiration on entire hash: EXPIRE availability:hotel_123:suite_garden_001 14400 (4 hours)
  7. Update Triggers: Hash is updated when:
    • Bookings change availability for specific room type variations and dates
    • Periodic batch sync from database (availability data only)

Atomic Operations for OLTP Transactions: Redis Hash Design

The OLTP Transaction Pattern:

The entire booking transaction (decrementing room count) is a single, atomic Redis command:

HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 -1
Enter fullscreen mode Exit fullscreen mode

Why This Works for OLTP:

  • O(1) Performance: Constant time operation, designed for high-throughput scenarios
  • Atomic: The increment/decrement operation is atomic—no race conditions
  • Single Operation: No multi-step process, no re-indexing, no segment merging
  • Conceptual Opposite of Elasticsearch: This is the exact opposite of Elasticsearch's multi-stage, resource-intensive re-indexing "update"

Hash Structure Design:

  • Redis Key: availability:hotel_id:room_type_id (e.g., availability:hotel_123:suite_garden_001)
  • Hash Fields: {date} (e.g., 2024-06-15, 2024-06-16)
  • Hash Values: Integer representing room count (e.g., 10, 5, 0)

Alternative: JSON Value with Room Count

The document shows JSON values with room_count and last_updated. For pure OLTP counter operations, you can also use simple integer values:

  • Simple Integer: HSET availability:hotel_123:suite_garden_001 2024-06-15 10
  • Atomic Decrement: HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 -1 → Result: 9
  • Atomic Increment: HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 1 → Result: 10

Date Range Check (OLAP Query Pattern):

Checking availability for a 5-night stay is a single, efficient command:

HMGET availability:hotel_123:suite_garden_001 2024-06-15 2024-06-16 2024-06-17 2024-06-18 2024-06-19
Enter fullscreen mode Exit fullscreen mode

Performance: O(N) where N = number of fields requested (i.e., number of nights), not the total number of documents in the database. This is extremely fast.

Key Takeaway: Redis Hash structure with atomic HINCRBY operations provides the exact OLTP capabilities needed for high-frequency availability updates—simple, fast, atomic, and designed for this exact workload.

Date Range Validation with Hash Structure (Room Type Variation Aware):

Critical: Hotel Bookings are for Nights

  • Guest checks in on check-in date (stays that night)
  • Guest checks out on check-out date (stays the previous night, leaves on check-out date)
  • Room is occupied on check-in date and (check-out - 1 day), but available again on check-out date
  • Date range to check: check_in_date, check_out_date - 1 day

Process:

  1. Room Type ID from Elasticsearch: Elasticsearch nested query filters hotels by room type and returns room_type_id (e.g., "suite_garden_001", "suite_ocean_001")
  2. Generate date range: Dates from check-in to (check-out - 1 day) inclusive
  3. Hash Key: Use availability:hotel_id:room_type_id (e.g., availability:hotel_123:suite_garden_001)
  4. Batch check: HMGET availability:hotel_id:room_type_id date1 date2... (all nights in range)
  5. Validation: If all dates return non-null values with room_count > 0, room type variation is available for entire stay
  6. Performance: O(N) where N = number of nights (check-out - check-in), not total available dates

Example: Check-in 2024-06-15, check-out 2024-06-17 → Check dates [2024-06-15, 2024-06-16]

Sorted Set Pattern:

  • Key Format: availability:hotel_id:room_type_id:dates
  • Members: date strings (2024-06-15, 2024-06-16)
  • Scores: availability status (1 = available, 0 = unavailable)
  • Use Case: Date range queries with ranking per room type variation
  • Performance: O(log N) for range queries

Examples:

  • availability:hotel_123:suite_garden_001:dates
  • availability:hotel_123:suite_ocean_001:dates

How the Sorted Set Gets Created:

  1. Initial Creation: When hotel availability data is loaded for a specific room type variation
  2. Sorted Set Key: Create sorted set with key availability:hotel_id:room_type_id:dates
    • Example: availability:hotel_123:suite_garden_001:dates (for suite with garden view)
    • Example: availability:hotel_123:suite_ocean_001:dates (for suite with ocean view)
  3. Member Addition: For each available date for that room type variation, add as member with score 1:
    • Redis command: ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-15"
    • Redis command: ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-16"
    • Redis command: ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-17"
  4. Member Updates: When availability changes for that room type variation:
    • Mark available: ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-17" (add date with score 1)
    • Mark unavailable: ZREM availability:hotel_123:suite_garden_001:dates "2024-06-17" (remove date from set)
  5. Clean Set: Only available dates are stored in the sorted set (no "unavailable" members with score 0)
  6. Range Queries: Query all available dates for that room type variation:
    • Redis command: ZRANGE availability:hotel_123:suite_garden_001:dates 0 -1 (get all available dates, sorted)
    • Filter dates in desired range (e.g., June 2024)
  7. TTL: Set expiration: EXPIRE availability:hotel_123:suite_garden_001:dates 14400 (4 hours)
  8. Update Triggers: Sorted set is updated when:
    • Bookings change availability
    • New availability periods are added
    • Periodic sync from database

Hash Structure vs Sorted Set: When to Use Each

Hash Structure Advantages:

  • O(1) field access: Direct access to any date
  • Direct updates: Update specific dates without pattern matching
  • Simple date range validation: Check all dates in range using HMGET
  • Efficient storage: Only store dates that have availability
  • Better for: Checking if specific dates are available, updating individual dates

Example Use Case:

  • Query: "Is hotel_123 available for check-in 2024-06-15 to check-out 2024-06-17?"
  • Date range to check: 2024-06-15, 2024-06-16
  • Hash: HMGET availability:hotel_123:suite_garden_001 2024-06-15 2024-06-16
  • Check if all dates return non-null values
  • Performance: O(N) where N = number of nights (check-out - check-in)

Sorted Set Advantages:

  • Range queries: Efficiently find all available dates in a date range
  • Sorted order: Dates are automatically sorted, making range queries fast
  • Bulk operations: Get all available dates in a range with one query
  • Better for: Finding all available dates in a future period, date range discovery

Example Use Case:

  • Query: "Find all available dates for hotel_123 in June 2024"
  • Sorted Set: ZRANGE availability:hotel_123:suite_garden_001:dates 0 -1 (get all available dates, sorted)
  • Filter dates in June 2024 range
  • Performance: O(log N + M) where N = total dates, M = dates in range
  • Advantage: Clean set with only available dates, no need to filter out "unavailable" members

Comparison Table:

Feature Hash Structure Sorted Set
Single Date Check O(1) O(log N)
Date Range Check O(N) - check each date O(log N + M) - efficient range query
Update Single Date O(1) - direct update O(log N) - need to find member
Find All Available Dates O(N) - scan all fields O(log N + M) - efficient range query
Memory Efficiency Efficient (only available dates) Efficient (only available dates)
Use Case Check specific date ranges Discover available dates in periods

Recommendation for Hotel Search:

Use Hash Structure for the recommended approach because:

  • Most queries check specific date ranges (check-in to check-out)
  • Need to verify all dates in range are available
  • Hash structure provides O(1) access to specific dates
  • Simpler to update when bookings change
  • More intuitive for date range validation

Use Sorted Set if you need:

  • To find all available dates in a future period (e.g., "show me all available dates in July")
  • Efficient range queries for date discovery
  • Automatic sorting of dates

For Most Hotel Search Systems: Hash structure is the better choice because the primary use case is checking if a specific date range is available, not discovering all available dates.

Key Naming Conventions

Redis Key Patterns (Availability Only):

  • Hash Structure (Recommended): availability:hotel_id:room_type_id (hash fields are dates)
  • Simple Key-Value: availability:hotel_id:room_type_id:check_in:check_out
  • Sorted Set: availability:hotel_id:room_type_id:dates

TTL Strategies for Availability Data

Time-to-Live Configuration (Redis/ElastiCache):

  • Availability Keys: 1-4 hours (matches booking timeout, ensures fresh availability data)

Atomic Operations for Concurrent Booking

Important: Database is the Final Authority

The primary database is the source of truth for all availability data. Redis is a fast access layer:

  • Database: Authoritative data, transactional consistency, ACID guarantees
  • Redis: Performance optimization layer, real-time access, eventual consistency with database
  • Synchronization: Redis must be updated to reflect database state, not the other way around
  • Conflict Resolution: If Redis and database disagree, database state is correct

Race Condition Prevention:

  • SET with NX: Only set if key doesn't exist (prevent concurrent updates)
  • WATCH/MULTI/EXEC: Transactional updates for complex operations
  • Lua Scripts: Atomic multi-step operations

Booking Flow (Database as Authority):

  1. Check Redis: Fast availability check in Redis
  2. Reserve in Database: Create booking record in database (authoritative)
  3. Update Redis: Update Redis to reflect database state
  4. If Database Fails: Booking is not confirmed, Redis remains unchanged
  5. If Redis Fails: Database state is correct, Redis will be synced later

Real-Time Update Patterns

Update Flow: Database → Redis (One-Way Sync)

All updates originate from the database. Redis is updated to reflect database state:

Update Strategies:

  • Immediate Updates: After database transaction commits, immediately update Redis to reflect new availability
  • Batch Updates: Periodic sync from database to Redis to catch any missed updates
  • Event-Driven: Database triggers or change data capture (CDC) events update Redis in real-time
  • Fallback Sync: Periodic reconciliation with database to ensure Redis matches database state

Update Order (Critical):

  1. Database Transaction: Make changes in database first (authoritative)
  2. Transaction Commit: Wait for database commit to succeed
  3. Redis Update: Update Redis to match database state
  4. If Redis Update Fails: Database state is still correct, Redis will be synced later

Why This Matters:

  • Data Integrity: Database transactions ensure consistency
  • Redis is Cache: Redis can be rebuilt from database if needed
  • No Data Loss: If Redis fails, database has all the data
  • Eventual Consistency: Redis may be temporarily out of sync, but database is always correct

Key Takeaway: Redis provides sub-millisecond availability checks through optimized data structures and atomic operations, enabling real-time booking systems that would be impossible with database-only approaches. The primary database remains the final authority (see "Important: Database is the Final Authority" above) - all updates flow from database to Redis, ensuring data integrity and consistency.

7. Hybrid Search Execution Flow

Sequential vs Parallel Execution

Sequential Execution (Simple but Slower):

  1. Elasticsearch search (25ms) → Get candidate hotels
  2. Redis availability check (10ms) → Filter by availability
  3. Total: 35ms

Parallel Execution (Complex but Faster):

  1. Start Elasticsearch search (25ms) and Redis availability check (15ms) simultaneously
  2. Wait for both to complete (max of both = 25ms)
  3. Filter Elasticsearch results by Redis availability
  4. Total: 25ms (28% improvement)

Elasticsearch Candidate Generation

Search Phase:

  • Geographic Filtering: City, radius, location-based filtering
  • Amenity Filtering: Pool, spa, gym, restaurant availability
  • Text Search: Hotel name, description, location text matching
  • Room Type Filtering: Nested queries for room type requirements
  • Price Range Filtering: Room type pricing constraints

Result Set:

  • Candidate Hotels: 100-1000 hotels matching search criteria
  • Hotel Metadata: Name, location, amenities, room types
  • Search Scores: Relevance ranking for result ordering

Redis Availability Validation

How Elasticsearch and Redis Work Together:

Step 1: Elasticsearch Filters by Room Type

  • User search includes room type filter (e.g., "suite with ocean view")
  • Elasticsearch nested query filters hotels that have matching room type variations (e.g., suite with ocean view)
  • Elasticsearch returns: List of hotels with matching room types (hotel metadata + room type info including room_type_id)
  • Result: We know which hotels have matching room types and which room_type_id to check in Redis

Step 2: Redis Checks Availability for Specific Room Type Variation

  • For each hotel from Elasticsearch, we get the room_type_id (e.g., "suite_ocean_001", "deluxe_001")
  • Use hash key: availability:hotel_id:room_type_id (e.g., availability:hotel_123:suite_ocean_001)
  • Check availability for the specific room type variation across date range

Availability Check Process:

  1. Extract Room Type ID: From Elasticsearch results, get room_type_id (e.g., "suite_ocean_001", "deluxe_001")
  2. Generate Date List: Create list of dates from check-in to (check-out - 1 day) inclusive
    • Critical: Hotel bookings are for nights, not days (see Section 6 for detailed explanation)
    • Dates to check: [check_in_date, check_out_date - 1 day]
  3. Build Hash Key: availability:hotel_id:room_type_id
  4. HMGET Check: HMGET availability:hotel_id:room_type_id date1 date2... (all nights in range)
  5. Validate Results: Check if all dates have non-null values with room_count > 0

Example Flow:

User Query: "Suite with ocean view in Los Angeles, check-in: 2024-06-15, check-out: 2024-06-17"

Elasticsearch:
  - Filters: city=Los Angeles, room_type="suite", view="ocean"
  - Returns: [hotel_123 (has suite_ocean_001), hotel_456 (has suite_ocean_001), hotel_789 (has suite_ocean_001)]
  - Each result includes: hotel_id, room_type_id="suite_ocean_001"

Redis Availability Check:
  - Dates to check: [2024-06-15, 2024-06-16] (see Section 6 for explanation of date range logic)
  - For hotel_123: HMGET availability:hotel_123:suite_ocean_001 "2024-06-15" "2024-06-16"
  - For hotel_456: HMGET availability:hotel_456:suite_ocean_001 "2024-06-15" "2024-06-16"
  - For hotel_789: HMGET availability:hotel_789:suite_ocean_001 "2024-06-15" "2024-06-16"

Results:
  - hotel_123: [value, value] → Available (both nights have suite with ocean view)
  - hotel_456: [value, null] → Unavailable (2024-06-16 has no suite with ocean view available)
  - hotel_789: [value, value] → Available
Enter fullscreen mode Exit fullscreen mode

Capacity Validation:

  • If user needs multiple rooms (e.g., 2 suites with ocean view):
    • Check room_count in hash values: {"room_count": 5, ...}
    • Ensure room_count >= required_rooms for all dates in range
    • Example: If user needs 2 rooms, but room_count is 1 → Unavailable

Price Validation:

  • Price comes from Elasticsearch room type data (Redis is exclusively for availability - see Section 6)
  • Check price from Elasticsearch results against user's budget
  • Ensure price is within budget for the selected room type variation

Filtering Process:

  • Hash Key Pattern: availability:hotel_id:room_type_id
  • Batch Operations: Use Redis pipeline to batch multiple HMGET commands
  • Early Exit: Stop checking if any date returns null (hotel unavailable)
  • Result Filtering: Remove hotels where room type is unavailable for any date in range

Result Merging and Ranking Strategy

Merging Process:

  1. Elasticsearch Results: Ranked by relevance score
  2. Redis Availability: Binary available/unavailable status
  3. Intersection: Keep only available hotels from Elasticsearch results
  4. Final Ranking: Maintain Elasticsearch relevance order for available hotels

Ranking Factors:

  • Search Relevance: BM25 score from Elasticsearch
  • Availability Status: Available hotels ranked higher
  • Price Competitiveness: Lower prices get slight boost
  • User Preferences: Historical booking patterns, loyalty status

Interview Tip: Streaming Results with Pagination

Once Elasticsearch returns ranked results, you don't need to wait for all availability checks before sending results to the UI. Instead, use a streaming/pagination approach:

Streaming Optimization:

  1. Elasticsearch Returns: Ranked list of hotels (e.g., 500 hotels)
  2. Batch Processing: Process hotels in batches of 10-20
  3. Redis Check: Check availability for first batch (e.g., first 10 hotels)
  4. Send to UI: Immediately send available hotels from first batch to UI
  5. Continue Processing: While UI displays first page, check availability for next batch
  6. Pagination: As user scrolls, send next batch of available hotels

Benefits:

  • Time to First Result (TTFR): User sees results faster (e.g., 30ms instead of 200ms)
  • Better UX: Progressive loading feels faster than waiting for all results
  • Reduced Latency: Don't check availability for hotels user might never see
  • Efficient Resource Usage: Only process what's needed for current page

Example Flow:

Elasticsearch: Returns 500 ranked hotels (25ms)
↓
Batch 1: Check availability for hotels 1-10 (5ms) → Send to UI
↓
Batch 2: Check availability for hotels 11-20 (5ms) → Cache for next page
↓
Batch 3: Check availability for hotels 21-30 (5ms) → Cache for next page
...
User scrolls → Load cached batch 2
Enter fullscreen mode Exit fullscreen mode

Key Insight: Since results are already ranked by Elasticsearch, you can stream them in order. The UI only needs the first 10-20 results immediately, so there's no need to wait for all 500 hotels to be checked.

Performance Comparison

Execution Pattern Elasticsearch Redis Total Time Complexity
Sequential 25ms 10ms 35ms Simple
Parallel 25ms 15ms 25ms Medium
Batch Redis 25ms 5ms 30ms Medium
Cached Results 5ms 2ms 7ms High

Key Takeaway: Parallel execution provides 28% performance improvement while batch Redis operations can reduce availability check time by 50%, making the hybrid approach significantly faster than sequential processing.

8. APIs and Query Flow

Search API Endpoint Structure

Primary Endpoint: GET /api/v1/hotels/search

Query Parameters:

  • location (required): City name or coordinates
  • check_in (required): Check-in date (YYYY-MM-DD)
  • check_out (required): Check-out date (YYYY-MM-DD)
  • guests (optional): Number of guests (default: 2)
  • room_type (optional): Specific room type (deluxe, suite)
  • amenities (optional): Array of amenities (pool, spa, gym)
  • price_min/max (optional): Price range constraints
  • radius (optional): Search radius in kilometers (default: 30)
  • sort (optional): Sort order (relevance, price, rating, distance)

Step-by-Step Query Execution Flow

Phase 1: Query Processing (5ms)

  1. Input Validation: Validate dates, location, parameters
  2. Query Normalization: Standardize location names, date formats
  3. Parameter Extraction: Extract search criteria and filters
  4. Cache Check: Check for cached results

Phase 2: Elasticsearch Search (25ms)

  1. Geographic Filter: Filter by location and radius
  2. Amenity Filter: Filter by requested amenities
  3. Room Type Filter: Nested queries for room type requirements
  4. Text Search: Multi-match queries for hotel names and descriptions
  5. Price Filter: Range queries for price constraints
  6. Result Ranking: BM25 scoring and relevance ranking

Phase 3: Redis Availability Check (10ms)

  1. Hotel ID Extraction: Get hotel IDs from Elasticsearch results
  2. Availability Keys: Generate Redis keys for date range
  3. Batch Check: MGET operation for multiple hotels
  4. Availability Filtering: Remove unavailable hotels
  5. Price Validation: Verify pricing within constraints

Phase 4: Result Assembly (5ms)

  1. Result Merging: Combine Elasticsearch and Redis results
  2. Final Ranking: Apply availability and pricing factors
  3. Response Formatting: Structure response with hotel details
  4. Cache Storage: Store results for future queries

Response Format with Availability and Pricing

Response Structure:

{
  "hotels": [
    {
      "hotel_id": "hotel_123",
      "name": "Luxury Downtown Hotel",
      "location": {
        "city": "Los Angeles",
        "coordinates": [34.0522, -118.2437],
        "address": "123 Main St"
      },
      "rating": 4.5,
      "amenities": ["pool", "spa", "gym"],
      "room_types": [
        {
          "type": "deluxe",
          "capacity": 2,
          "price_per_night": 299,
          "available": true,
          "rooms_left": 5
        }
      ],
      "total_price": 598,
      "search_score": 0.95
    }
  ],
  "total_results": 150,
  "search_time": 35,
  "cache_hit": false
}
Enter fullscreen mode Exit fullscreen mode

Error Handling and Fallback Strategies

Error Scenarios:

  • Elasticsearch Unavailable: Fallback to database with reduced functionality
  • Redis Unavailable: Skip availability filtering, show all results
  • Timeout Errors: Return partial results with warning
  • Invalid Parameters: Return error with parameter validation details

Fallback Strategies:

  • Database Fallback: Use PostgreSQL for basic search when Elasticsearch fails
  • Cached Results: Serve cached results during outages
  • Degraded Mode: Reduce functionality but maintain basic search
  • Graceful Degradation: Show maintenance message with trending hotels

Monitoring and Alerting:

  • Latency Monitoring: Alert if response time > 100ms
  • Error Rate Monitoring: Alert if error rate > 5%
  • Cache Performance: Monitor cache hit rates and memory usage
  • Availability Monitoring: Track Elasticsearch and Redis health

Key Takeaway: The API design balances comprehensive search capabilities with performance optimization, using parallel execution and intelligent caching to maintain sub-100ms response times while providing rich search functionality and robust error handling.


Soundbite for interviews: "Hotel search requires a hybrid Elasticsearch + Redis architecture because no single system can efficiently handle both complex search/filtering and real-time availability updates. The key is using Elasticsearch for what it does best (search and filtering) and Redis for what it does best (real-time data), with parallel execution improving time to first results."

Next

Hotel Booking Schema Design

Top comments (1)

Collapse
 
sofia__petrova profile image
Sofia Petrova

This was super detailed and practical—especially the Redis hash design for OLTP-style availability. One thing I was curious about: how would you adapt this architecture for heavy personalization (e.g., user-specific ranking, loyalty-based pricing) without blowing up ES index complexity? It might be interesting if your next post dug into personalization and ranking strategies on top of this search stack.