1. Introduction
Key Technical Topics Covered (Interview Highlights)
- Hybrid OLAP/OLTP architecture (Sections 1 & 2): Elasticsearch for multi-dimensional search + Redis for real-time availability (and why neither alone is sufficient).
- Transactional Outbox + CDC (Section 2.2): Keeping Elasticsearch in sync with the primary DB without dual writes.
- Elasticsearch modeling (Sections 3 & 4): Denormalized hotel/room documents, geo fields, amenity facets, script scoring, and index aliasing for zero-downtime reindex.
- Redis availability design (Section 6): Per-night bitmaps / sorted sets, multi-night intersections, TTL/eviction strategies, and consistency with bookings.
- Query pipeline (Section 7): Coordinated ES + Redis flow with fallbacks when availability or index data diverges.
- Advanced filtering & ranking (Section 5): Amenity faceting, price buckets, popularity boosts, personalization hooks.
- Scaling patterns (Sections 2 & 7): Multi-region clusters, hot/warm node tiers, cache layers, snapshot/alias-based reindexing. (Cross-region replication—e.g., keeping Redis regional while asynchronously syncing aggregates or using multi-region Elasticsearch—follows the same patterns but is out of scope for this doc.)
Prerequisite Reading (Shared Components)
This hotel search design builds on the same platform described in @sumedhbala’s ticketing series:
- Designing a Large-Scale Ticketing System
- Part 2 – Event Discovery
- Part 3 – Seat Management
- Part 4 – Payments & Ticket Issuance
From those posts we inherit the core stack: PostgreSQL + transactional outbox, Debezium/Kafka pipelines, Elasticsearch clusters, Redis caches, and Stripe-style payment flows. This document focuses on how those building blocks are reconfigured for hotel-specific search, availability, and ranking concerns.
Hotel Search Challenges
Hotel search and booking introduce complexities that differ sharply from event search. Two fundamental differences from event search drive the architectural complexity:
1. Availability Integration in Search:
- Real-time Availability Validation: Availability must be checked during the search phase, not after results are displayed
- Date Range Queries: Users search for stays spanning multiple nights (check-in to check-out), requiring multi-date availability validation
- Recurring Availability: Same rooms available daily, unlike one-time events
- Real-time Updates: Availability changes constantly as bookings are made, requiring sub-millisecond response times
2. Extensive Filter Requirements:
- Multiple Amenity Filters: Users expect to filter by numerous amenities (pool, spa, beachfront, gym, restaurant, concierge, parking, pet-friendly, etc.)
- Room Type Complexity: Multiple room categories (Standard, Deluxe, Suite) with different capacities and room-level amenities
- Complex Filter Combinations: Location, amenities, price range, room type, guest count, and date combinations all working together
Additional challenges include:
- Fungible Inventory: Room 101 and Room 102 of the same type are interchangeable
- Dynamic Pricing: Prices vary by date, season, demand, and length of stay
Key Differences from Event Search
| Aspect | Event Search | Hotel Search |
|---|---|---|
| Inventory Model | Fixed seats, non-fungible | Room types, fungible inventory |
| Availability | One-time events | Recurring daily availability |
| Date Handling | Single event date | Date ranges (check-in to check-out) |
| Pricing | Dynamic per seat/section (demand-based) | Dynamic per room type and date (seasonal, demand-based) |
| Search Complexity | Simple (venue, date, category) | Complex (location, dates, room type, multiple amenity filters: pool, spa, beachfront, gym, etc.) |
| Availability in Search | Checked after displaying events | Must be part of search criteria |
Hybrid Architecture Overview: Mandatory OLAP/OLTP Separation
Terminology refresher: OLAP (Online Analytical Processing) handles read-heavy, multi-dimensional queries (e.g., Elasticsearch), while OLTP (Online Transaction Processing) powers write-heavy, transactional workloads (e.g., Redis + primary DB). The rest of this document uses these terms frequently.
Hotel search systems face two fundamentally contradictory workloads that cannot be solved by a single system:
1. OLAP Workload (Analytical Search):
- Complex, read-intensive, multi-dimensional queries
- Example: "Find all 4-star hotels in New York with pool and free breakfast for 2 adults and 2 children from 2024-10-26 to 2024-10-30, sorted by review score"
- Optimized for: High-speed reads of large datasets, complex filtering, full-text search
- Technology: Elasticsearch (built on Apache Lucene, designed for OLAP)
2. OLTP Workload (Transactional Availability):
- High-frequency, simple, isolated transactions
- Example: "Decrement room count for Hotel 123, room type 'King', on date 2024-10-26"
- Optimized for: High-throughput, low-latency updates, absolute data integrity
- Technology: Redis (in-memory database designed for OLTP)
The Architectural Mandate:
The dual-system architecture (Elasticsearch + Redis) is not optional—it is mandatory because:
- OLTP optimizes for write efficiency (small, fast, atomic updates)
- OLAP optimizes for read efficiency (complex, large-scale, multi-dimensional queries)
- These optimization goals are directly contradictory
Attempting to use Elasticsearch for OLTP availability updates would fail catastrophically due to its "update penalty" (see Section 3.2). Conversely, using a relational database for OLAP search queries lacks the ability to efficiently perform complex, multi-dimensional, and full-text search.
Key Takeaway: The combination of OLAP (Elasticsearch) and OLTP (Redis) systems is not merely the "best" solution—it is the only viable one for a business requiring both robust transaction processing and powerful data analysis.
2. Baseline Functional Solution
Services
- Search API Service – Handles user queries, coordinates Elasticsearch and Redis operations, returns ranked results with availability
- Availability Service – Manages Redis availability data, handles booking updates, ensures data consistency
- Hotel Management Service – Source of truth for hotel metadata (writes to primary database)
Note: Data synchronization from the primary database to Elasticsearch is handled by an event-driven pipeline (see Section 2.2: Data Synchronization Architecture)
Data Stores
- Elasticsearch Cluster – Search index for hotels, room types, amenities, location data (includes built-in query caching)
- Redis Cluster (or AWS ElastiCache) – Real-time availability data exclusively
- Primary Database (PostgreSQL/MySQL) – Hotel metadata, room configurations, pricing rules
Note: Redis is used exclusively for availability data. Search result caching is handled by Elasticsearch's built-in query cache or application-level caching mechanisms. (This is explained in detail in Section 6.)
High-Level Architecture
The system follows a three-tier approach:
- Search Tier: Elasticsearch handles complex queries (location, amenities, text search)
- Availability Tier: Redis provides real-time availability validation
- Data Tier: Primary database maintains canonical hotel information
Soundbite for interviews: "Hotel search requires a hybrid architecture because Elasticsearch excels at complex filtering but struggles with real-time availability updates, while Redis provides sub-millisecond availability checks but lacks sophisticated search capabilities."
Data Synchronization Architecture: Transactional Outbox + Change Data Capture (CDC)
At a glance, data flows DB → Outbox → CDC/Debezium → Kafka → Elasticsearch. The subsections below break this down so readers can skim the summary and dive deeper only if needed.
The Dual-Write Anti-Pattern Problem:
A naive approach would use an "Indexing Service" that attempts to write to both the database and Elasticsearch synchronously:
Write A: Update PostgreSQL (hotel name change)
Write B: Update Elasticsearch (hotel name change)
This is a well-known anti-pattern because it cannot guarantee atomic execution across two independent systems. The risk of data inconsistency is not a risk—it is a guarantee over time and at scale.
Failure Scenarios:
- Partial Failure (Stale Search Index): Database update succeeds, Elasticsearch update fails → Search index permanently stale
- Partial Failure (Phantom Data): Elasticsearch update succeeds, database update fails → Hotel searchable but doesn't exist in system of record
- Race Conditions: Concurrent updates arrive out of order → Permanent data corruption
The Solution: Transactional Outbox Pattern + CDC
The robust solution decouples writes using an asynchronous, event-driven architecture:
1. Transactional Outbox Pattern:
- Create an
Outbox_Eventstable in the primary PostgreSQL database - When updating hotel data, perform one atomic database transaction:
BEGIN TRANSACTION;
-- Business data write
UPDATE Hotels SET name = 'New Name' WHERE id = 123;
-- Event write (atomic within same transaction)
INSERT INTO Outbox_Events (payload) VALUES ('{"id": 123, "name": "New Name"}');
COMMIT;
- This guarantees atomicity: if UPDATE fails, INSERT is rolled back; if INSERT fails, UPDATE is rolled back
2. Change Data Capture (CDC):
- Use a log-based CDC tool (e.g., Debezium) that reads PostgreSQL's Write-Ahead Log (WAL)
- Debezium sees the committed event in
Outbox_Eventsand publishes it to Apache Kafka - This is non-intrusive and low-overhead (reads transaction log, doesn't query database)
3. Event-Driven Pipeline:
PostgreSQL (System of Record)
↓ (WAL)
Debezium (CDC Source Connector)
↓ (Kafka message)
Apache Kafka (Event Bus - provides resilience and buffering)
↓ (Kafka message)
Kafka Connect (Elasticsearch Sink Connector)
↓ (index operation)
Elasticsearch (Search Index)
Key Benefits:
- Data Integrity: Only 100% committed data is ever published
- Resilience: If Elasticsearch is down, events buffer in Kafka with no data loss
- Decoupling: Database and Elasticsearch are fully independent
- Scalability: Each component can scale independently
- Eventual Consistency: Tunable lag (typically seconds to low minutes) is the correct trade-off for high-performance distributed systems
Configuration Requirements:
-
PostgreSQL: Enable logical replication (
wal_level = logical) -
Debezium: Configure to read only from
Outbox_Eventstable - Kafka: Long retention period (e.g., 7 days) for disaster recovery
-
Elasticsearch Sink: Configure for idempotency (
pk.mode: record_key) and delete handling (behavior.on.null.values: delete) - Dead Letter Queue (DLQ): Route failed messages to DLQ to prevent pipeline halting
Key Takeaway: The Transactional Outbox + CDC pattern eliminates the dual-write anti-pattern, providing a production-grade, resilient architecture for data synchronization that guarantees data integrity and system availability.
3. Elasticsearch Document Structure
Availability Data Structure Options
Quick overview: We evaluate four availability strategies:
- Keep all availability in the primary database + Redis (Elasticsearch remains metadata-only).
- Embed detailed availability inside Elasticsearch documents.
- Hybrid flag in Elasticsearch plus full detail in Redis.
- Maintain a separate availability-focused Elasticsearch index. Use this summary to choose which option details to read.
Option 1: Availability Only in Primary Database and Redis
- Elasticsearch: No availability data stored
- Redis: Real-time availability for all date ranges
- Primary Database: Source of truth for availability
Pros:
- Single Source of Truth: No data synchronization complexity
- Always Accurate: No stale availability data in Elasticsearch
- Simpler Architecture: Fewer moving parts, less to maintain
- Real-time Guarantees: Redis provides sub-millisecond updates
Cons:
- Post-Filtering Required: Must filter Elasticsearch results with Redis availability check
- Two-Step Process: Elasticsearch search → Redis availability check
- Slightly Higher Latency: Additional Redis lookup after search
Option 2: Nested Availability in Elasticsearch
- Availability data embedded in room type documents
- Good for simple queries, limited scalability for date ranges
Pros:
- Filtering in Search Phase: Can filter by availability during Elasticsearch query
- Single Query: All filtering happens in one place
Cons:
- Data Synchronization: Must keep Elasticsearch in sync with DB
- Stale Data Risk: Availability in Elasticsearch may be outdated
- Update Complexity: Every booking requires Elasticsearch update
- Index Size: Large index with date-based availability fields
Option 3: Hybrid Approach
- Elasticsearch: Summary availability flag (e.g., "has_availability_next_30_days: true")
- Redis: Detailed real-time availability for specific dates
- Primary Database: Source of truth
Pros:
- Initial Filtering: Elasticsearch can filter out hotels with no availability
- Real-time Accuracy: Redis provides precise availability for date ranges
- Reduced Load: Fewer hotels passed to Redis for availability check
- Best of Both Worlds: Search performance + real-time accuracy
Cons:
- Dual Maintenance: Must update both Elasticsearch summary and Redis details
- Complexity: More components to manage
Option 4: Separate Availability Index in Elasticsearch
- Dedicated index for availability data
- Better for complex date range queries, requires joins
Pros:
- Complex Queries: Can handle sophisticated date range queries
- Separation of Concerns: Availability data separate from hotel metadata
Cons:
- Join Overhead: Requires joining availability index with hotel index
- Synchronization Complexity: Multiple indexes to keep in sync
- Performance: Joins are slower than single-index queries
Recommendation: Option 1 (Redis/DB Only) is Recommended
Why Option 1 is the Default Choice:
- Availability changes frequently - Every booking updates availability in real-time
- Redis is already optimized - Sub-millisecond lookups, perfect for real-time data
- Simplicity wins - No sync complexity, no stale data risk
- Performance is acceptable - Parallel execution (Elasticsearch + Redis) provides good performance
- Single source of truth - Database is the authority, Redis is a performance cache (see Section 6 for database authority details)
Why Elasticsearch Cannot Handle OLTP Availability Updates: The "Update Penalty"
The Fundamental Problem: Elasticsearch's Immutable Segment Architecture
Elasticsearch is built on Apache Lucene, which stores data in immutable segments on disk. This design has a critical consequence: Elasticsearch does not support in-place updates or deletes.
How "Updates" Actually Work in Elasticsearch:
When a document is "updated" (e.g., changing num_available_rooms from 10 to 9), Elasticsearch performs two operations:
- Soft Delete: Marks the old document (with 10 rooms) as "deleted"
- Re-index: Indexes a brand new document (with 9 rooms) into a new, small segment
This means a simple transactional counter change becomes a full document re-index, which is CPU-intensive and requires the entire document to be processed and analyzed again.
The Compounding Cost: Segment Merging
High-frequency updates create thousands of tiny segments and massive numbers of soft-deleted documents. This is catastrophic for search performance because:
- Elasticsearch must open and query every tiny segment
- CPU cycles are wasted filtering out soft-deleted documents
- To combat this, Lucene runs a background segment merging process
Segment Merging Overhead:
- Reads small segments
- Filters out soft-deleted documents
- Merges remaining "live" documents into new, larger segments
- Extremely resource-intensive: Consumes substantial CPU, disk I/O, and memory
The "Update Penalty" Impact:
If Elasticsearch were used for the OLTP availability workload:
- High-frequency updates (thousands per minute) would trigger constant, aggressive segment merging
- Intense background CPU and I/O activity would compete directly with primary OLAP search queries
- Result: Increased query latency and system instability
This is why using Elasticsearch for frequently updated, mutable data is a well-known architectural anti-pattern. The OLTP availability workload must be handled by a system designed for high-throughput, low-latency updates—Redis.
When to Consider Option 3 (Hybrid with Summary) - Edge Cases Only:
- Very high query volume - When you need to reduce Redis load by pre-filtering (rare)
- Large candidate sets - When Elasticsearch returns 10,000+ hotels and you want to reduce Redis checks (unusual)
- Geographic filtering - When initial geographic filter yields too many results (can be handled with better Elasticsearch queries)
Note: For most hotel search systems, Option 1 (Redis/DB Only) is sufficient and preferred due to its simplicity and real-time accuracy.
Architecture Pattern (Option 1 - Recommended):
- Elasticsearch: Filters by location, amenities, room types, price (no availability)
- Redis: Validates availability for filtered hotels (date range check)
- Result: Only hotels with availability are returned
Performance Impact:
- Elasticsearch search: 25ms (filters by location, amenities, etc.)
- Redis availability check: 10ms (checks availability for filtered hotels)
- Total: 35ms with parallel execution
Key Takeaway: For most hotel search systems, keeping availability only in Redis and the primary database is the recommended approach. It provides simplicity, real-time accuracy, and acceptable performance. Only use Elasticsearch for availability if you have very high query volumes and need to reduce Redis load through pre-filtering.
Hotel Document Schema
Hotel documents in Elasticsearch use nested structures to represent the complex relationships between hotels and their room types:
Recommended Approach (Option 1 - Redis/DB Only):
Hotel Document:
├── Basic Information (hotel_id, name, location, rating)
├── Amenities (pool, spa, gym, restaurant)
├── Location Data (geo_point, address, city, country)
└── Room Types (nested array)
├── Room Type 1 (room_type_id: "deluxe_001", type: "deluxe", capacity: 2, base_price: 299, amenities: ["wifi", "tv"])
├── Room Type 2 (room_type_id: "suite_garden_001", type: "suite", view: "garden", capacity: 4, base_price: 599, amenities: ["wifi", "tv", "balcony"])
└── Room Type 3 (room_type_id: "suite_ocean_001", type: "suite", view: "ocean", capacity: 4, base_price: 799, amenities: ["wifi", "tv", "balcony", "ocean_view"])
Note: Room types can have variations (e.g., "suite with garden view" vs "suite with ocean view"). Each variation has a unique room_type_id used for availability tracking in Redis. (See "Important: Room Type ID in All Keys" in Section 6 for detailed explanation.)
Note: No availability data in Elasticsearch. Availability is checked in Redis after Elasticsearch filtering.
Alternative Approach (Option 3 - Hybrid with Summary):
Hotel Document:
├── Basic Information (hotel_id, name, location, rating)
├── Amenities (pool, spa, gym, restaurant)
├── Location Data (geo_point, address, city, country)
└── Room Types (nested array)
├── Room Type 1 (room_type_id: "deluxe_001", type: "deluxe", capacity: 2, base_price: 299, amenities: ["wifi", "tv"])
├── Room Type 2 (room_type_id: "suite_garden_001", type: "suite", view: "garden", capacity: 4, base_price: 599, amenities: ["wifi", "tv", "balcony"])
├── Room Type 3 (room_type_id: "suite_ocean_001", type: "suite", view: "ocean", capacity: 4, base_price: 799, amenities: ["wifi", "tv", "balcony", "ocean_view"])
└── Availability Summary
├── has_availability_next_30_days: true/false
├── price_range: {min: 199, max: 599}
└── last_updated: timestamp
What is Availability Summary? (Only for Option 3 - Hybrid Approach)
If using the hybrid approach, the Availability Summary contains aggregated availability information used for initial filtering:
Fields:
-
has_availability_next_30_days: Boolean flag indicating if hotel has any availability in the next 30 days
- Purpose: Quickly filter out hotels with no availability
- Example:
truemeans hotel has at least one room available in next 30 days - Updated: Periodically (e.g., every 15 minutes or when significant availability changes)
-
price_range: Minimum and maximum prices across all available rooms
- Purpose: Pre-filter by price range before detailed Redis check
- Example:
{min: 199, max: 599}means cheapest room is $199, most expensive is $599 - Updated: When room prices change significantly
-
last_updated: Timestamp of when summary was last refreshed
- Purpose: Track data freshness, implement cache invalidation
- Example:
2024-06-15T10:30:00Z
Important Notes:
- Not Real-Time: Summary is updated periodically, not on every booking
- Coarse Filtering: Used to eliminate hotels with no availability, not for final availability check
- Redis Still Required: Detailed availability check still happens in Redis
- Trade-off: Summary reduces Redis load but adds sync complexity
When Availability Summary is Updated:
- Periodic refresh (every 15-30 minutes)
- When availability drops to zero (no rooms available)
- When availability becomes available after being zero
- Manual refresh triggered by significant booking events
Understanding Nested Structures (Beginner-Friendly)
What is a Nested Structure?
Imagine you have a hotel document. Without nested structures, you might store room types like this:
Hotel: "Luxury Hotel"
Room Types: ["deluxe", "suite", "standard"]
Room Prices: [299, 599, 199]
Problem: If you search for hotels with "deluxe" rooms under $300, Elasticsearch might match this hotel incorrectly. Why? Because it sees "deluxe" (✓) and "$199" (✓) in the arrays, but doesn't know that "$199" belongs to "standard", not "deluxe". It treats all array items as independent.
Solution - Nested Structures:
With nested structures, each room type (including variations) is stored as a separate "mini-document" inside the hotel document:
Hotel: "Luxury Hotel"
Room Types (nested):
- Room Type 1: {room_type_id: "deluxe_001", type: "deluxe", price: 299, capacity: 2}
- Room Type 2: {room_type_id: "suite_garden_001", type: "suite", view: "garden", price: 599, capacity: 4}
- Room Type 3: {room_type_id: "suite_ocean_001", type: "suite", view: "ocean", price: 799, capacity: 4}
Now when you search for "deluxe rooms under $300", Elasticsearch checks each room type as a complete unit. It finds "deluxe" with price "$299" as a matched pair, which is correct.
Room Type Variations:
The same room type category (e.g., "suite") can have multiple variations with different amenities or views:
- "suite with garden view" (room_type_id: "suite_garden_001")
- "suite with ocean view" (room_type_id: "suite_ocean_001")
Each variation has its own unique room_type_id used for availability tracking in Redis, allowing separate availability management for each variation. (See Section 6 for how room_type_id is used in Redis keys.)
Real-World Analogy:
Think of a hotel document as a filing cabinet. Without nesting, all room information is in one big drawer mixed together. With nesting, each room type has its own folder in the drawer, keeping related information together.
How Does It Help?
Accurate Filtering: When filtering by "deluxe rooms under $300", you get hotels that actually have deluxe rooms at that price, not hotels that have a deluxe room AND a cheaper room (but unrelated).
Independent Queries: You can ask "show me hotels where the suite has ocean view but the standard room doesn't" - nested structures allow you to query each room type independently.
Better Performance: Elasticsearch can optimize queries better because it knows which fields belong together, reducing false matches.
Why Nested Model Over Parent-Child: Optimizing for OLAP Reads
Elasticsearch offers two options for modeling one-to-many relationships (hotel → room types): Nested and Parent-Child.
Nested Model (Selected):
- Hotel and room types stored as a single document
- Parent (hotel) and children (rooms) are co-located in the same Lucene block
- Pros: Significantly faster queries, low memory overhead
- Cons: High update cost—updating any field forces re-indexing of the entire document (hotel + all rooms)
Parent-Child Model (Rejected):
- Hotel (parent) and rooms (children) indexed as separate documents
- Pros: Low update cost—updating one child only re-indexes that child
- Cons: Slower queries due to join overhead, higher memory overhead (requires in-memory "join list")
The Decision: OLTP Workload Removed = No Compromise Needed
The Nested vs Parent-Child debate is a microcosm of the entire OLAP/OLTP problem:
- Nested model optimizes for reads (OLAP)
- Parent-Child model compromises on read performance to gain update efficiency (a step towards OLTP)
Because the high-frequency OLTP workload (availability) has been completely removed and placed in Redis, the system is no longer forced to compromise. The correct choice is to optimize the search index for its primary, read-heavy workload. Therefore, the Nested model is used.
Rationale:
- Infrequent updates to static hotel data (handled by asynchronous CDC pipeline) will incur the higher re-indexing cost
- This is a worthwhile trade-off for maximizing query performance for all users
- The update penalty only affects static data changes (hotel name, amenities), not high-frequency availability updates
Index Refresh Configuration:
- For this OLAP index,
index.refresh_intervalcan be set to a high value (e.g., 30s or 60s) or disabled during bulk loads - A high-frequency OLTP workload would require a low interval (e.g., default 1s), triggering constant, costly refreshes
- With OLTP removed, the index can be tuned for maximum indexing performance and stability
Key Takeaway: By moving the OLTP availability workload to Redis, Elasticsearch can be fully optimized for OLAP reads using the Nested model, without compromising on update performance for the primary search workload.
Field Mapping Strategy
What is Field Mapping?
Field mapping tells Elasticsearch how to store and index each field in your documents. Different field types enable different query capabilities and performance characteristics. Choosing the right field type is crucial for search performance and accuracy.
Field Type Breakdown:
-
hotel_id: keyword (exact matching, aggregations)
- Why: Hotel IDs are unique identifiers that need exact matching only
- Use Case: "Find hotel with ID 'hotel_123'" or "Count hotels by ID"
- Performance: Very fast exact lookups, no text analysis overhead
-
name: text with keyword subfield (full-text search + exact matching)
- Why: Hotel names need both fuzzy text search AND exact matching
- text field: Handles typos, partial matches, relevance scoring (e.g., "Luxury" matches "Luxurious")
- keyword subfield: Exact matching for autocomplete, sorting, or filtering
- Use Case: Text search finds "Luxury Downtown Hotel" even if user types "Luxry Downtown", while keyword enables exact filtering
-
location: geo_point (geospatial queries, distance calculations)
- Why: Geographic coordinates need special handling for distance queries
- Use Case: "Find hotels within 5km of coordinates [34.0522, -118.2437]"
- Performance: Optimized spatial indexing for fast distance calculations
-
amenities: keyword array (filtering, faceted search)
- Why: Amenities are standardized discrete values (e.g., ["pool", "spa", "gym"]) that need exact matching
- Use Case: Filter by "pool AND spa" - must match exact amenity values, not partial text matches
- Why not text:
- Text analysis would tokenize values, breaking exact matching needed for filtering
- Amenities are categorical data (not free-form text), so they should be matched exactly
- Example: If "pool table" appears in hotel description,
texttype might match "pool" filter incorrectly in the wrong field (pool table = billiards, not swimming pool) - Performance: Fast exact matching for multiple amenities simultaneously
-
room_types: nested object (complex room type queries)
- Why: Room types have their own attributes (price, capacity, amenities) that need independent filtering
- Use Case: "Find hotels with deluxe rooms under $300" - must check room type AND price together
- Performance: Enables filtering within nested documents without false matches
-
rating: float (range queries, sorting)
- Why: Ratings are numeric values that need range queries and sorting
- Use Case: "Find hotels with rating >= 4.0" or "Sort by rating descending"
- Performance: Optimized for numeric comparisons and sorting operations
Right field type = Accurate results + Fast queries
Key Takeaway: Each field type is optimized for specific query patterns. Text fields handle fuzzy matching, keyword fields handle exact matching, geo_point handles distance calculations, and nested objects handle complex relationships. Choosing the right type ensures both query accuracy and performance.
4. Elasticsearch Index Design
Index Mapping Configuration
The hotel index uses a carefully designed mapping to optimize for different query patterns:
-
Text Fields: Standard analyzer for full-text search with stemming and stop words
- Stemming: Reduces words to their root form (e.g., "swimming" → "swim", "pools" → "pool") so that variations of the same word match
- Stop Words: Removes common words like "the", "and", "is" that don't add search value
- Keyword Fields: Exact matching for filters and aggregations
- Geo Fields: Geo-point mapping for distance-based queries
- Nested Fields: Specialized mapping for parent-child relationships
Nested Document Configuration
From an index design perspective, nested documents provide:
- Multi-level Filtering: Enables filtering by both hotel amenities (pool, spa) and room-level amenities (wifi, minibar) simultaneously
- Performance: Nested queries are optimized for parent-child relationships with efficient BitSet operations
- Flexibility: Easy to add new room types or room attributes without schema changes
Key Takeaway: The index design balances search performance with query flexibility, using appropriate field types for different use cases. For the recommended approach (Option 1), availability data is not stored in Elasticsearch - it's managed entirely in Redis and the primary database for real-time accuracy.
5. Query Patterns and Optimization
Filter-First Execution Strategy
Elasticsearch optimizes query performance by executing filters before expensive scoring operations:
Execution Order:
- Filter Context: Execute all filters, create BitSets (fast, cached)
- Query Context: Execute text search, create BitSets (fast, cached)
- BitSet Intersection: Combine filters with AND operations (very fast)
- Scoring Phase: Apply BM25 scoring only to filtered results (expensive but small set)
Performance Impact:
- Without Filter-First: Score 50,000 documents, then filter to 1,000
- With Filter-First: Filter to 1,000 documents, then score only those
- Performance Improvement: 25-50x faster execution
Example: Search for "Luxury hotels with pool and spa in Los Angeles"
Query Components:
- Text search: "Luxury" (query context - needs scoring)
- Location filter: "Los Angeles" (filter context - no scoring)
- Amenity filter: "pool AND spa" (filter context - no scoring)
Without Filter-First (Inefficient):
- Text search finds 50,000 hotels with "luxury" in name/description
- BM25 scoring calculated for all 50,000 hotels (expensive!)
- Field boosting applied to all 50,000 hotels
- Results ranked: Hotel_A (score 0.95), Hotel_B (score 0.92), ...
- Location filter applied: 50,000 → 2,000 hotels in Los Angeles
- Amenity filter applied: 2,000 → 800 hotels with pool AND spa
- Total cost: 50,000 scoring operations + ranking + filtering
With Filter-First (Optimized):
- Location filter: Create BitSet A of hotels in "Los Angeles" → 2,000 hotels (fast, cached)
- Amenity filter: Create BitSet B of hotels with "pool AND spa" → 5,000 hotels (fast, cached)
- Text search: Create BitSet C of hotels containing "luxury" → 50,000 hotels (fast - from inverted index)
- BitSet intersection: A ∩ B ∩ C = 600 hotels (very fast - bitwise AND operation)
- BM25 scoring: Calculate scores for only 600 hotels (expensive but small set)
- Field boosting: Apply to only 600 hotels
- Results ranked: Final 600 hotels with scores
- Total cost: 600 scoring operations (25-50x fewer than without filter-first)
Key Insight:
- Text search BitSet creation: Still processes ALL documents (50,000) from inverted index - this is fast
- Filter BitSet creation: Processes ALL documents (fast, uses doc values)
- BitSet intersection: Very fast - just bitwise AND operations
- Scoring optimization: Only scores the 600 hotels in the intersection, not all 50,000
- The magic: BitSet operations are fast even on large sets, but scoring is expensive - so we minimize scoring by intersecting BitSets first
Why This Works:
- Filters are fast: BitSet operations are O(1) per document
- Scoring is expensive: BM25 calculation is O(term_frequency) per document
- Reduce scoring set: Filter first to minimize expensive operations Interview Takeaway: BitSet Caching and Precomputation
- Caching Strategy: BitSets are cached for repeated queries (85-90% cache hit rate for popular filters)
-
Precomputation Opportunity: Amenity-based BitSets can be precomputed since amenities don't change frequently
- Common filters like "pool", "spa", "gym" can be precomputed and stored in memory
- Only updated when hotel amenities are added/removed (rare event)
- Provides instant filter execution without recalculating BitSets on every query
- Key Insight: Identify filters that change infrequently (amenities, location) vs. frequently (availability, price) - precompute the stable ones
Real-World Impact:
- Query latency: 200ms → 30ms (6.7x faster)
- CPU usage: 50,000 scoring operations → 600 operations (83x reduction)
- Memory: Cached BitSets reused for popular queries (85-90% cache hit rate)
Text Search with Multi-Match
Multi-match queries handle complex text search across multiple fields:
- Field Boosting: Hotel name gets higher weight than description
- Query Types: best_fields, most_fields, cross_fields for different matching strategies
- Fuzzy Matching: Automatic typo tolerance and synonym expansion
- Phrase Matching: Exact phrase matching with proximity scoring
Nested Queries for Room Type Filtering
Nested queries enable complex filtering within room type documents:
- Room Type Filtering: Find hotels with specific room types
- Capacity Filtering: Filter by guest count requirements
- Amenity Filtering: Room-level amenities (wifi, minibar, ocean_view)
- Price Range Filtering: Filter by room type pricing
How Filtering Works (Inverted Index vs Filtering)
Understanding the Difference:
Text Search (Uses Inverted Index):
- Inverted Index: Maps each word → list of documents containing that word
- Example: "deluxe" → [Hotel_A, Hotel_B, Hotel_C]
- Used for: Finding documents that contain specific words
- Fast for: "Find hotels with 'deluxe' in the name"
Filtering (Uses Doc Values):
- Doc Values: Columnar storage format - stores all values for a field together
- Example: All room prices stored in one column: [299, 599, 199, 450, ...]
- Used for: Exact matching, range queries, aggregations
- Fast for: "Find hotels with deluxe rooms under $300"
Why Different Data Structures?
Inverted indexes are optimized for text search (finding which documents contain terms), but they're inefficient for filtering operations like numeric comparisons or exact matches. Doc values store data in a columnar format that's optimized for filtering, sorting, and aggregations.
Comparison to Columnar Databases:
Elasticsearch's doc values use a similar storage strategy to columnar databases (e.g., Amazon Redshift, Google BigQuery, ClickHouse):
- Columnar Storage: All values for a field are stored together in a column (e.g., all prices: [299, 599, 199, 450])
- Optimized for Analytics: Fast filtering, sorting, and aggregations on numeric/categorical fields
-
Key Difference: Columnar databases store ALL data in columnar format, while Elasticsearch uses a hybrid approach:
- Inverted indexes (row-oriented) for text search
- Doc values (columnar) for filtering/aggregations
- This hybrid approach gives Elasticsearch both fast text search AND fast filtering in one system
Example: "Deluxe Rooms Under $300" Query
Let's trace through how Elasticsearch handles this query:
Step 1: Understanding the Data Structure
Hotel Document:
hotel_id: "hotel_123"
name: "Luxury Hotel"
room_types (nested):
- {type: "deluxe", price: 299, capacity: 2}
- {type: "suite", price: 599, capacity: 4}
- {type: "standard", price: 199, capacity: 2}
Pricing Reality (Interview Tip): In production the
pricestored in Elasticsearch is usually a summary (min_price_next_30_days,max_price, or a bucketed price range). This lets Elasticsearch filter aggressively without forcing constant reindexing. The precise per-date price (and availability) comes from Redis/the pricing service during the availability step, ensuring tomorrow’s rate (e.g., $400) is accurate even if Elasticsearch still lists $300 as the minimum. The rule of thumb: use Elasticsearch for coarse filtering, Redis for rapidly-changing truth.
Step 2: Nested Query Execution (Using BitSet Operations)
-
Access Nested Documents: Elasticsearch treats each nested room type as a separate "mini-document"
- Room Type 1: {type: "deluxe", price: 299}
- Room Type 2: {type: "suite", price: 599}
- Room Type 3: {type: "standard", price: 199}
-
Filter by Type (BitSet Creation): Using doc values, create BitSet of nested documents with type = "deluxe"
- Room Type 1: type = "deluxe" ✓ → BitSet bit 1 = true
- Room Type 2: type = "suite" ✗ → BitSet bit 2 = false
- Room Type 3: type = "standard" ✗ → BitSet bit 3 = false
- Result: BitSet A = 1, 0, 0
-
Filter by Price (BitSet Creation): Using doc values, create BitSet of nested documents with price < 300
- Room Type 1: price = 299 < 300 ✓ → BitSet bit 1 = true
- Room Type 2: price = 599 < 300 ✗ → BitSet bit 2 = false
- Room Type 3: price = 199 < 300 ✓ → BitSet bit 3 = true
- Result: BitSet B = 1, 0, 1
-
BitSet Intersection: A ∩ B = [1, 0, 0] ∩ [1, 0, 1] = [1, 0, 0]
- Only Room Type 1 matches both conditions (bitwise AND operation)
-
Return Parent Document: If any nested document matches, the parent hotel document is included
- Result: Hotel_123 is returned
Key Insight:
- Doc Values: Used to create BitSets for filtering operations (fast, cached)
- BitSet Operations: Filtering by type and price uses BitSet intersection (bitwise AND)
- Nested Documents: Each room type is stored separately, allowing independent BitSet creation
- Performance: BitSet operations are O(1) per document, much faster than scoring all documents
- Combination: The query can use both - text search for hotel names AND BitSet filtering for room types
Performance Comparison:
| Operation | Data Structure | Speed |
|---|---|---|
| Text search ("deluxe" in name) | Inverted Index | Very Fast |
| Filter (price < 300) | Doc Values | Very Fast |
| Filter (type = "deluxe") | Doc Values | Very Fast |
| Nested filter (deluxe AND price < 300) | Doc Values + Nested | Fast |
Why This Matters:
- Inverted indexes are great for "find documents containing words"
- Doc values are great for "find documents matching criteria"
- Nested structures allow filtering on related data (room types) independently
- Combining both gives you powerful search + accurate filtering
Query Execution Order and Performance
Optimal Query Structure (Option 1 - Recommended):
- Geographic Filters: city, radius (most selective)
- Amenity Filters: pool, spa, gym (moderately selective)
- Room Type Filters: nested queries (selective)
- Text Search: multi-match queries (expensive)
- Scoring: BM25 calculation (expensive but small set)
Note: Date range validation is handled by Redis after Elasticsearch filtering, not during the Elasticsearch query phase. This aligns with Option 1 (Redis/DB Only) where availability is not stored in Elasticsearch.
Key Takeaway: Filter-first execution with BitSet caching provides 25-50x performance improvement by reducing expensive scoring operations to only the documents that pass all filters. Detailed BitSet mechanics and caching strategies are explained in the "Filter-First Execution Strategy" section above.
6. Redis Availability Architecture
Architecture Note: Redis (or AWS ElastiCache) is used exclusively for availability data. Search result caching and other data caching are handled by Elasticsearch's built-in caching mechanisms or application-level caching.
Redis Data Structures for Availability
Important: Room Type ID in All Keys
All Redis availability keys include room_type_id because:
- Hotels have multiple room types and variations (e.g., "suite with garden view" vs "suite with ocean view") with separate availability
- Each room type variation has a unique
room_type_id(e.g., "suite_garden_001", "suite_ocean_001") - Elasticsearch filters by room type and returns
room_type_id, which is used to build Redis keys - This allows separate availability tracking for each variation
Redis uses multiple data structures optimized for availability patterns:
Simple Key-Value Pattern:
-
Key Format:
availability:hotel_id:room_type_id:check_in:check_out - Value: "1" (at least one room available) or number of available rooms (e.g., "5" for 5 rooms available) or null (unavailable)
- Use Case: Simple availability checks (not recommended due to update complexity)
- Performance: O(1) lookup time
How the Key Gets Created:
- Booking Event: When a booking is made or cancelled for a specific room type variation
-
Room Type ID from Elasticsearch: Elasticsearch nested query filters hotels by room type and returns
room_type_id(e.g., "suite_garden_001", "suite_ocean_001") -
Key Generation: Combine hotel_id, room_type_id, check_in date, and check_out date
- Example:
availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17
- Example:
- Availability Check: Query database for availability of that room type variation across all dates in range
-
Key Creation: If all dates are available for that room type variation, set key with value "1" (at least one available) or the number of available rooms (e.g., "5" for 5 rooms)
- Redis command:
SET availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17 "1" EX 14400(4 hour TTL) - for at least one room - Redis command:
SET availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17 "5" EX 14400(4 hour TTL) - for 5 available rooms
- Redis command:
-
Key Deletion: If any date becomes unavailable for that room type variation, delete the key
- Redis command:
DEL availability:hotel_123:suite_garden_001:2024-06-15:2024-06-17
- Redis command:
-
Update Trigger: Keys are created/updated/deleted when:
- Bookings are confirmed for specific room type variations
- Bookings are cancelled for specific room type variations
- New availability is added for specific room type variations
- Periodic sync from database
Challenge: Finding Which Keys to Modify - Combinatorial Explosion
The Problem:
When a booking is made for a specific date and room type variation (e.g., 2024-06-16, suite with ocean view), you need to invalidate all keys that include that date for that room type variation.
Combinatorial Explosion Example:
For a 30-day booking window, there are (30 × 31) / 2 = 465 possible check-in/check-out combinations. A single booking on one day (e.g., June 16th) would require finding and invalidating every single one of these 465 keys that contains "June 16th". This creates a combinatorial explosion of keys that must be tracked and updated.
Example Keys to Invalidate:
-
availability:hotel_123:suite_ocean_001:2024-06-15:2024-06-17(includes 2024-06-16) -
availability:hotel_123:suite_ocean_001:2024-06-16:2024-06-18(includes 2024-06-16) -
availability:hotel_123:suite_ocean_001:2024-06-14:2024-06-17(includes 2024-06-16) - ... and 462 more keys for a 30-day window
But Redis doesn't provide an efficient way to find all keys matching a date pattern.
Solution Options:
Option 1: Pattern Matching (Inefficient)
- Use
KEYS availability:hotel_123:*to find all keys for a hotel - Filter keys that include the affected date
- Delete matching keys
-
Problem:
KEYScommand is O(N) and blocks Redis, not suitable for production
Option 2: Secondary Index (Recommended)
- Maintain a separate index tracking which keys exist for each date and room type variation
- Use a hash:
availability_index:hotel_123:suite_ocean_001:2024-06-16→ Set of all keys containing this date for this room type variation - When booking affects 2024-06-16 for suite with ocean view:
- Get all keys from index:
SMEMBERS availability_index:hotel_123:suite_ocean_001:2024-06-16 - Delete all those keys:
DEL key1 key2 key3... - Remove index entry:
DEL availability_index:hotel_123:suite_ocean_001:2024-06-16
- Get all keys from index:
- When creating a key: Add key to index for each date in range and room type variation
- Performance: O(M) where M = number of keys containing that date for that room type variation
Option 3: Hash Structure (Better Alternative) - Solves Combinatorial Explosion
- Use hash structure instead of simple keys (see Hash Structure section)
- Key format:
availability:hotel_id:room_type_id - Hash fields: Individual dates (2024-06-15, 2024-06-16, etc.)
- Solves the Problem: Instead of 465 keys for a 30-day window, you have only 30 hash fields (one per day)
-
When booking affects 2024-06-16 for suite with garden view:
- Simply update/delete the hash field:
HDEL availability:hotel_123:suite_garden_001 2024-06-16 - No need to find which keys to modify - directly update the specific date field
- Simply update/delete the hash field:
- Performance: O(1) - directly access the date field for specific room type variation
- Scalability: Linear growth (30 fields for 30 days) vs. quadratic growth (465 keys for 30 days)
Recommendation:
For simple availability checks, Hash Structure is preferred because:
- No need to track which keys to invalidate
- Direct date-based access (O(1))
- Easier to update individual dates
- More efficient for date range queries
Hash Structure:
-
Key Format:
availability:hotel_id:room_type_id -
Hash Fields: Individual dates representing nights (2024-06-15, 2024-06-16, 2024-06-17, etc.)
- Each date field represents availability for that night
- Example: Field "2024-06-15" = availability for the night of June 15th
-
Hash Values: JSON object with
room_countandlast_updated(price comes from Elasticsearch/database, not Redis) - Use Case: Detailed availability per room type variation
- Performance: O(1) field access
Examples:
-
availability:hotel_123:deluxe_001(for deluxe rooms) -
availability:hotel_123:suite_garden_001(for suite with garden view) -
availability:hotel_123:suite_ocean_001(for suite with ocean view)
Advantage: Direct Date-Based Updates
Unlike the simple key-value pattern, hash structure allows direct updates to specific dates without needing to find which keys to modify. When a booking affects 2024-06-16 for a specific room type variation (e.g., suite with garden view), you directly update that hash field - no pattern matching or secondary indexes needed.
How the Hash Gets Created:
- Initial Creation: When hotel availability data is loaded into Redis for a specific room type variation
-
Hash Key: Create hash with key
availability:hotel_id:room_type_id- Example:
availability:hotel_123:suite_garden_001(for suite with garden view) - Example:
availability:hotel_123:suite_ocean_001(for suite with ocean view)
- Example:
-
Hash Field Population: For each date with availability, add a field:
- Redis command:
HSET availability:hotel_123:suite_garden_001 2024-06-15 '{"room_count":5,"last_updated":"2024-06-15T10:30:00Z"}' - Redis command:
HSET availability:hotel_123:suite_garden_001 2024-06-16 '{"room_count":3,"last_updated":"2024-06-15T10:30:00Z"}'
- Redis command:
-
Field Updates: When availability changes for that room type variation:
- Update specific date field:
HSET availability:hotel_123:suite_garden_001 2024-06-15 '{"room_count":4,"last_updated":"2024-06-15T14:20:00Z"}'
- Update specific date field:
-
Field Deletion: When date becomes unavailable for that room type variation:
- Remove field:
HDEL availability:hotel_123:suite_garden_001 2024-06-17
- Remove field:
-
TTL: Set expiration on entire hash:
EXPIRE availability:hotel_123:suite_garden_001 14400(4 hours) -
Update Triggers: Hash is updated when:
- Bookings change availability for specific room type variations and dates
- Periodic batch sync from database (availability data only)
Atomic Operations for OLTP Transactions: Redis Hash Design
The OLTP Transaction Pattern:
The entire booking transaction (decrementing room count) is a single, atomic Redis command:
HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 -1
Why This Works for OLTP:
- O(1) Performance: Constant time operation, designed for high-throughput scenarios
- Atomic: The increment/decrement operation is atomic—no race conditions
- Single Operation: No multi-step process, no re-indexing, no segment merging
- Conceptual Opposite of Elasticsearch: This is the exact opposite of Elasticsearch's multi-stage, resource-intensive re-indexing "update"
Hash Structure Design:
-
Redis Key:
availability:hotel_id:room_type_id(e.g.,availability:hotel_123:suite_garden_001) -
Hash Fields:
{date}(e.g.,2024-06-15,2024-06-16) -
Hash Values: Integer representing room count (e.g.,
10,5,0)
Alternative: JSON Value with Room Count
The document shows JSON values with room_count and last_updated. For pure OLTP counter operations, you can also use simple integer values:
-
Simple Integer:
HSET availability:hotel_123:suite_garden_001 2024-06-15 10 -
Atomic Decrement:
HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 -1→ Result:9 -
Atomic Increment:
HINCRBY availability:hotel_123:suite_garden_001 2024-06-15 1→ Result:10
Date Range Check (OLAP Query Pattern):
Checking availability for a 5-night stay is a single, efficient command:
HMGET availability:hotel_123:suite_garden_001 2024-06-15 2024-06-16 2024-06-17 2024-06-18 2024-06-19
Performance: O(N) where N = number of fields requested (i.e., number of nights), not the total number of documents in the database. This is extremely fast.
Key Takeaway: Redis Hash structure with atomic HINCRBY operations provides the exact OLTP capabilities needed for high-frequency availability updates—simple, fast, atomic, and designed for this exact workload.
Date Range Validation with Hash Structure (Room Type Variation Aware):
Critical: Hotel Bookings are for Nights
- Guest checks in on check-in date (stays that night)
- Guest checks out on check-out date (stays the previous night, leaves on check-out date)
- Room is occupied on check-in date and (check-out - 1 day), but available again on check-out date
- Date range to check: check_in_date, check_out_date - 1 day
Process:
-
Room Type ID from Elasticsearch: Elasticsearch nested query filters hotels by room type and returns
room_type_id(e.g., "suite_garden_001", "suite_ocean_001") - Generate date range: Dates from check-in to (check-out - 1 day) inclusive
-
Hash Key: Use
availability:hotel_id:room_type_id(e.g.,availability:hotel_123:suite_garden_001) -
Batch check:
HMGET availability:hotel_id:room_type_id date1 date2...(all nights in range) - Validation: If all dates return non-null values with room_count > 0, room type variation is available for entire stay
- Performance: O(N) where N = number of nights (check-out - check-in), not total available dates
Example: Check-in 2024-06-15, check-out 2024-06-17 → Check dates [2024-06-15, 2024-06-16]
Sorted Set Pattern:
-
Key Format:
availability:hotel_id:room_type_id:dates - Members: date strings (2024-06-15, 2024-06-16)
- Scores: availability status (1 = available, 0 = unavailable)
- Use Case: Date range queries with ranking per room type variation
- Performance: O(log N) for range queries
Examples:
availability:hotel_123:suite_garden_001:datesavailability:hotel_123:suite_ocean_001:dates
How the Sorted Set Gets Created:
- Initial Creation: When hotel availability data is loaded for a specific room type variation
-
Sorted Set Key: Create sorted set with key
availability:hotel_id:room_type_id:dates- Example:
availability:hotel_123:suite_garden_001:dates(for suite with garden view) - Example:
availability:hotel_123:suite_ocean_001:dates(for suite with ocean view)
- Example:
-
Member Addition: For each available date for that room type variation, add as member with score 1:
- Redis command:
ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-15" - Redis command:
ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-16" - Redis command:
ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-17"
- Redis command:
-
Member Updates: When availability changes for that room type variation:
- Mark available:
ZADD availability:hotel_123:suite_garden_001:dates 1 "2024-06-17"(add date with score 1) - Mark unavailable:
ZREM availability:hotel_123:suite_garden_001:dates "2024-06-17"(remove date from set)
- Mark available:
- Clean Set: Only available dates are stored in the sorted set (no "unavailable" members with score 0)
-
Range Queries: Query all available dates for that room type variation:
- Redis command:
ZRANGE availability:hotel_123:suite_garden_001:dates 0 -1(get all available dates, sorted) - Filter dates in desired range (e.g., June 2024)
- Redis command:
-
TTL: Set expiration:
EXPIRE availability:hotel_123:suite_garden_001:dates 14400(4 hours) -
Update Triggers: Sorted set is updated when:
- Bookings change availability
- New availability periods are added
- Periodic sync from database
Hash Structure vs Sorted Set: When to Use Each
Hash Structure Advantages:
- O(1) field access: Direct access to any date
- Direct updates: Update specific dates without pattern matching
- Simple date range validation: Check all dates in range using HMGET
- Efficient storage: Only store dates that have availability
- Better for: Checking if specific dates are available, updating individual dates
Example Use Case:
- Query: "Is hotel_123 available for check-in 2024-06-15 to check-out 2024-06-17?"
- Date range to check: 2024-06-15, 2024-06-16
- Hash:
HMGET availability:hotel_123:suite_garden_001 2024-06-15 2024-06-16 - Check if all dates return non-null values
- Performance: O(N) where N = number of nights (check-out - check-in)
Sorted Set Advantages:
- Range queries: Efficiently find all available dates in a date range
- Sorted order: Dates are automatically sorted, making range queries fast
- Bulk operations: Get all available dates in a range with one query
- Better for: Finding all available dates in a future period, date range discovery
Example Use Case:
- Query: "Find all available dates for hotel_123 in June 2024"
- Sorted Set:
ZRANGE availability:hotel_123:suite_garden_001:dates 0 -1(get all available dates, sorted) - Filter dates in June 2024 range
- Performance: O(log N + M) where N = total dates, M = dates in range
- Advantage: Clean set with only available dates, no need to filter out "unavailable" members
Comparison Table:
| Feature | Hash Structure | Sorted Set |
|---|---|---|
| Single Date Check | O(1) | O(log N) |
| Date Range Check | O(N) - check each date | O(log N + M) - efficient range query |
| Update Single Date | O(1) - direct update | O(log N) - need to find member |
| Find All Available Dates | O(N) - scan all fields | O(log N + M) - efficient range query |
| Memory Efficiency | Efficient (only available dates) | Efficient (only available dates) |
| Use Case | Check specific date ranges | Discover available dates in periods |
Recommendation for Hotel Search:
Use Hash Structure for the recommended approach because:
- Most queries check specific date ranges (check-in to check-out)
- Need to verify all dates in range are available
- Hash structure provides O(1) access to specific dates
- Simpler to update when bookings change
- More intuitive for date range validation
Use Sorted Set if you need:
- To find all available dates in a future period (e.g., "show me all available dates in July")
- Efficient range queries for date discovery
- Automatic sorting of dates
For Most Hotel Search Systems: Hash structure is the better choice because the primary use case is checking if a specific date range is available, not discovering all available dates.
Key Naming Conventions
Redis Key Patterns (Availability Only):
-
Hash Structure (Recommended):
availability:hotel_id:room_type_id(hash fields are dates) -
Simple Key-Value:
availability:hotel_id:room_type_id:check_in:check_out -
Sorted Set:
availability:hotel_id:room_type_id:dates
TTL Strategies for Availability Data
Time-to-Live Configuration (Redis/ElastiCache):
- Availability Keys: 1-4 hours (matches booking timeout, ensures fresh availability data)
Atomic Operations for Concurrent Booking
Important: Database is the Final Authority
The primary database is the source of truth for all availability data. Redis is a fast access layer:
- Database: Authoritative data, transactional consistency, ACID guarantees
- Redis: Performance optimization layer, real-time access, eventual consistency with database
- Synchronization: Redis must be updated to reflect database state, not the other way around
- Conflict Resolution: If Redis and database disagree, database state is correct
Race Condition Prevention:
- SET with NX: Only set if key doesn't exist (prevent concurrent updates)
- WATCH/MULTI/EXEC: Transactional updates for complex operations
- Lua Scripts: Atomic multi-step operations
Booking Flow (Database as Authority):
- Check Redis: Fast availability check in Redis
- Reserve in Database: Create booking record in database (authoritative)
- Update Redis: Update Redis to reflect database state
- If Database Fails: Booking is not confirmed, Redis remains unchanged
- If Redis Fails: Database state is correct, Redis will be synced later
Real-Time Update Patterns
Update Flow: Database → Redis (One-Way Sync)
All updates originate from the database. Redis is updated to reflect database state:
Update Strategies:
- Immediate Updates: After database transaction commits, immediately update Redis to reflect new availability
- Batch Updates: Periodic sync from database to Redis to catch any missed updates
- Event-Driven: Database triggers or change data capture (CDC) events update Redis in real-time
- Fallback Sync: Periodic reconciliation with database to ensure Redis matches database state
Update Order (Critical):
- Database Transaction: Make changes in database first (authoritative)
- Transaction Commit: Wait for database commit to succeed
- Redis Update: Update Redis to match database state
- If Redis Update Fails: Database state is still correct, Redis will be synced later
Why This Matters:
- Data Integrity: Database transactions ensure consistency
- Redis is Cache: Redis can be rebuilt from database if needed
- No Data Loss: If Redis fails, database has all the data
- Eventual Consistency: Redis may be temporarily out of sync, but database is always correct
Key Takeaway: Redis provides sub-millisecond availability checks through optimized data structures and atomic operations, enabling real-time booking systems that would be impossible with database-only approaches. The primary database remains the final authority (see "Important: Database is the Final Authority" above) - all updates flow from database to Redis, ensuring data integrity and consistency.
7. Hybrid Search Execution Flow
Sequential vs Parallel Execution
Sequential Execution (Simple but Slower):
- Elasticsearch search (25ms) → Get candidate hotels
- Redis availability check (10ms) → Filter by availability
- Total: 35ms
Parallel Execution (Complex but Faster):
- Start Elasticsearch search (25ms) and Redis availability check (15ms) simultaneously
- Wait for both to complete (max of both = 25ms)
- Filter Elasticsearch results by Redis availability
- Total: 25ms (28% improvement)
Elasticsearch Candidate Generation
Search Phase:
- Geographic Filtering: City, radius, location-based filtering
- Amenity Filtering: Pool, spa, gym, restaurant availability
- Text Search: Hotel name, description, location text matching
- Room Type Filtering: Nested queries for room type requirements
- Price Range Filtering: Room type pricing constraints
Result Set:
- Candidate Hotels: 100-1000 hotels matching search criteria
- Hotel Metadata: Name, location, amenities, room types
- Search Scores: Relevance ranking for result ordering
Redis Availability Validation
How Elasticsearch and Redis Work Together:
Step 1: Elasticsearch Filters by Room Type
- User search includes room type filter (e.g., "suite with ocean view")
- Elasticsearch nested query filters hotels that have matching room type variations (e.g., suite with ocean view)
- Elasticsearch returns: List of hotels with matching room types (hotel metadata + room type info including
room_type_id) -
Result: We know which hotels have matching room types and which
room_type_idto check in Redis
Step 2: Redis Checks Availability for Specific Room Type Variation
- For each hotel from Elasticsearch, we get the
room_type_id(e.g., "suite_ocean_001", "deluxe_001") - Use hash key:
availability:hotel_id:room_type_id(e.g.,availability:hotel_123:suite_ocean_001) - Check availability for the specific room type variation across date range
Availability Check Process:
-
Extract Room Type ID: From Elasticsearch results, get
room_type_id(e.g., "suite_ocean_001", "deluxe_001") -
Generate Date List: Create list of dates from check-in to (check-out - 1 day) inclusive
- Critical: Hotel bookings are for nights, not days (see Section 6 for detailed explanation)
- Dates to check: [check_in_date, check_out_date - 1 day]
-
Build Hash Key:
availability:hotel_id:room_type_id -
HMGET Check:
HMGET availability:hotel_id:room_type_id date1 date2...(all nights in range) - Validate Results: Check if all dates have non-null values with room_count > 0
Example Flow:
User Query: "Suite with ocean view in Los Angeles, check-in: 2024-06-15, check-out: 2024-06-17"
Elasticsearch:
- Filters: city=Los Angeles, room_type="suite", view="ocean"
- Returns: [hotel_123 (has suite_ocean_001), hotel_456 (has suite_ocean_001), hotel_789 (has suite_ocean_001)]
- Each result includes: hotel_id, room_type_id="suite_ocean_001"
Redis Availability Check:
- Dates to check: [2024-06-15, 2024-06-16] (see Section 6 for explanation of date range logic)
- For hotel_123: HMGET availability:hotel_123:suite_ocean_001 "2024-06-15" "2024-06-16"
- For hotel_456: HMGET availability:hotel_456:suite_ocean_001 "2024-06-15" "2024-06-16"
- For hotel_789: HMGET availability:hotel_789:suite_ocean_001 "2024-06-15" "2024-06-16"
Results:
- hotel_123: [value, value] → Available (both nights have suite with ocean view)
- hotel_456: [value, null] → Unavailable (2024-06-16 has no suite with ocean view available)
- hotel_789: [value, value] → Available
Capacity Validation:
- If user needs multiple rooms (e.g., 2 suites with ocean view):
- Check
room_countin hash values:{"room_count": 5, ...} - Ensure
room_count >= required_roomsfor all dates in range - Example: If user needs 2 rooms, but room_count is 1 → Unavailable
- Check
Price Validation:
- Price comes from Elasticsearch room type data (Redis is exclusively for availability - see Section 6)
- Check price from Elasticsearch results against user's budget
- Ensure price is within budget for the selected room type variation
Filtering Process:
-
Hash Key Pattern:
availability:hotel_id:room_type_id - Batch Operations: Use Redis pipeline to batch multiple HMGET commands
- Early Exit: Stop checking if any date returns null (hotel unavailable)
- Result Filtering: Remove hotels where room type is unavailable for any date in range
Result Merging and Ranking Strategy
Merging Process:
- Elasticsearch Results: Ranked by relevance score
- Redis Availability: Binary available/unavailable status
- Intersection: Keep only available hotels from Elasticsearch results
- Final Ranking: Maintain Elasticsearch relevance order for available hotels
Ranking Factors:
- Search Relevance: BM25 score from Elasticsearch
- Availability Status: Available hotels ranked higher
- Price Competitiveness: Lower prices get slight boost
- User Preferences: Historical booking patterns, loyalty status
Interview Tip: Streaming Results with Pagination
Once Elasticsearch returns ranked results, you don't need to wait for all availability checks before sending results to the UI. Instead, use a streaming/pagination approach:
Streaming Optimization:
- Elasticsearch Returns: Ranked list of hotels (e.g., 500 hotels)
- Batch Processing: Process hotels in batches of 10-20
- Redis Check: Check availability for first batch (e.g., first 10 hotels)
- Send to UI: Immediately send available hotels from first batch to UI
- Continue Processing: While UI displays first page, check availability for next batch
- Pagination: As user scrolls, send next batch of available hotels
Benefits:
- Time to First Result (TTFR): User sees results faster (e.g., 30ms instead of 200ms)
- Better UX: Progressive loading feels faster than waiting for all results
- Reduced Latency: Don't check availability for hotels user might never see
- Efficient Resource Usage: Only process what's needed for current page
Example Flow:
Elasticsearch: Returns 500 ranked hotels (25ms)
↓
Batch 1: Check availability for hotels 1-10 (5ms) → Send to UI
↓
Batch 2: Check availability for hotels 11-20 (5ms) → Cache for next page
↓
Batch 3: Check availability for hotels 21-30 (5ms) → Cache for next page
...
User scrolls → Load cached batch 2
Key Insight: Since results are already ranked by Elasticsearch, you can stream them in order. The UI only needs the first 10-20 results immediately, so there's no need to wait for all 500 hotels to be checked.
Performance Comparison
| Execution Pattern | Elasticsearch | Redis | Total Time | Complexity |
|---|---|---|---|---|
| Sequential | 25ms | 10ms | 35ms | Simple |
| Parallel | 25ms | 15ms | 25ms | Medium |
| Batch Redis | 25ms | 5ms | 30ms | Medium |
| Cached Results | 5ms | 2ms | 7ms | High |
Key Takeaway: Parallel execution provides 28% performance improvement while batch Redis operations can reduce availability check time by 50%, making the hybrid approach significantly faster than sequential processing.
8. APIs and Query Flow
Search API Endpoint Structure
Primary Endpoint: GET /api/v1/hotels/search
Query Parameters:
- location (required): City name or coordinates
- check_in (required): Check-in date (YYYY-MM-DD)
- check_out (required): Check-out date (YYYY-MM-DD)
- guests (optional): Number of guests (default: 2)
- room_type (optional): Specific room type (deluxe, suite)
- amenities (optional): Array of amenities (pool, spa, gym)
- price_min/max (optional): Price range constraints
- radius (optional): Search radius in kilometers (default: 30)
- sort (optional): Sort order (relevance, price, rating, distance)
Step-by-Step Query Execution Flow
Phase 1: Query Processing (5ms)
- Input Validation: Validate dates, location, parameters
- Query Normalization: Standardize location names, date formats
- Parameter Extraction: Extract search criteria and filters
- Cache Check: Check for cached results
Phase 2: Elasticsearch Search (25ms)
- Geographic Filter: Filter by location and radius
- Amenity Filter: Filter by requested amenities
- Room Type Filter: Nested queries for room type requirements
- Text Search: Multi-match queries for hotel names and descriptions
- Price Filter: Range queries for price constraints
- Result Ranking: BM25 scoring and relevance ranking
Phase 3: Redis Availability Check (10ms)
- Hotel ID Extraction: Get hotel IDs from Elasticsearch results
- Availability Keys: Generate Redis keys for date range
- Batch Check: MGET operation for multiple hotels
- Availability Filtering: Remove unavailable hotels
- Price Validation: Verify pricing within constraints
Phase 4: Result Assembly (5ms)
- Result Merging: Combine Elasticsearch and Redis results
- Final Ranking: Apply availability and pricing factors
- Response Formatting: Structure response with hotel details
- Cache Storage: Store results for future queries
Response Format with Availability and Pricing
Response Structure:
{
"hotels": [
{
"hotel_id": "hotel_123",
"name": "Luxury Downtown Hotel",
"location": {
"city": "Los Angeles",
"coordinates": [34.0522, -118.2437],
"address": "123 Main St"
},
"rating": 4.5,
"amenities": ["pool", "spa", "gym"],
"room_types": [
{
"type": "deluxe",
"capacity": 2,
"price_per_night": 299,
"available": true,
"rooms_left": 5
}
],
"total_price": 598,
"search_score": 0.95
}
],
"total_results": 150,
"search_time": 35,
"cache_hit": false
}
Error Handling and Fallback Strategies
Error Scenarios:
- Elasticsearch Unavailable: Fallback to database with reduced functionality
- Redis Unavailable: Skip availability filtering, show all results
- Timeout Errors: Return partial results with warning
- Invalid Parameters: Return error with parameter validation details
Fallback Strategies:
- Database Fallback: Use PostgreSQL for basic search when Elasticsearch fails
- Cached Results: Serve cached results during outages
- Degraded Mode: Reduce functionality but maintain basic search
- Graceful Degradation: Show maintenance message with trending hotels
Monitoring and Alerting:
- Latency Monitoring: Alert if response time > 100ms
- Error Rate Monitoring: Alert if error rate > 5%
- Cache Performance: Monitor cache hit rates and memory usage
- Availability Monitoring: Track Elasticsearch and Redis health
Key Takeaway: The API design balances comprehensive search capabilities with performance optimization, using parallel execution and intelligent caching to maintain sub-100ms response times while providing rich search functionality and robust error handling.
Soundbite for interviews: "Hotel search requires a hybrid Elasticsearch + Redis architecture because no single system can efficiently handle both complex search/filtering and real-time availability updates. The key is using Elasticsearch for what it does best (search and filtering) and Redis for what it does best (real-time data), with parallel execution improving time to first results."
Top comments (1)
This was super detailed and practical—especially the Redis hash design for OLTP-style availability. One thing I was curious about: how would you adapt this architecture for heavy personalization (e.g., user-specific ranking, loyalty-based pricing) without blowing up ES index complexity? It might be interesting if your next post dug into personalization and ranking strategies on top of this search stack.