DEV Community

Sumedh Bala
Sumedh Bala

Posted on

Part 2: Event Discovery

Search & Indexing

Event discovery is the foundation of any ticketing system. The goal is to let users quickly find concerts, sports matches, or theater events by name, location, or venue. While it looks simple from the outside, the design must handle millions of queries, abbreviations (LA vs. Los Angeles), nearby cities (Pasadena), and fuzzy matches (typos like San Franciso).

This breakdown covers the baseline design first, followed by a deep dive into enhanced search capabilities.

1. Baseline Functional Solution

Services

  • Search API Service – Handles user queries, normalizes input, executes searches, and returns ranked results.
  • Indexing Service – Updates the search index whenever new events are created or updated.
  • Event Management Service (EMS) – Manages canonical event metadata and feeds it to the search index.

Databases & Caches

  • Primary Database (PostgreSQL / MySQL / DynamoDB) – Stores event metadata.
  • Search Index (Elasticsearch / OpenSearch / Solr) – Provides full-text and geospatial search.
  • Cache (Redis / Memcached) – Stores trending searches and frequently accessed results.

Database Schema (Events Table)

event_id  UUID or BIGINT (primary key)
event_name  Event title (e.g., "Taylor Swift Eras Tour")
venue_name  Venue (e.g., "SoFi Stadium")
city_normalized  Canonical city name (e.g., "Los Angeles")
state  State or province (e.g., "CA")
country  Country (e.g., "USA")
location_lat  Venue latitude
location_lng  Venue longitude
event_start_time  Start datetime
event_end_time  End datetime
search_synonyms  JSON/array of synonyms (e.g., ["LA", "L.A.", "Los Angeles"])
Enter fullscreen mode Exit fullscreen mode

2. Why Indexes Differ by Column

  • event_id → B-Tree index, fast lookup after search returns candidate IDs.
  • event_name, venue_name, search_synonyms → Full-text or trigram indexes (support partial matches/typos).
  • city_normalized, state, country → B-Tree index (exact match queries).
  • location_lat, location_lng → GiST or SP-GiST index (PostGIS) for geo queries.
  • event_start_time, event_end_time → B-Tree index for range queries.
  • Composite Indexes → (city_normalized, event_start_time) for common multi-column queries like "finding all upcoming events in Los Angeles, ordered by date."

Key Takeaway: Use the right index type per query pattern: B-Trees for equality/range, GIN/trigrams for text, GiST for geo, and composite for multi-column queries. PostgreSQL handles the system of record; Elasticsearch handles fuzzy candidate generation.

3. APIs

Search API (Basic)

Endpoint: GET /search

Query Parameters:

  • q (string) → user query ("LA concerts")
  • location (optional, lat/lng) → geo filtering
  • radius (optional, default 30km)
  • date_range (optional, start + end)

Response: JSON object with an events array, each including:
event_id, event_name, venue, city, state, country, event_start_time, location (lat/lng)

4. Example SQL Queries

Find events in a city:

SELECT * FROM events WHERE city_normalized = 'Los Angeles' AND event_start_time > NOW();
Enter fullscreen mode Exit fullscreen mode

Find events within 30 km radius:
Use spherical distance function with lat/lng coordinates.

Find events by name (text search):

SELECT * FROM events WHERE event_name ILIKE '%Taylor Swift%';
Enter fullscreen mode Exit fullscreen mode

5. Problems Using PostgreSQL Alone

  • Free-text search is slow at scale.
  • Fuzzy/typo handling is limited.
  • Synonyms/aliases require hacks.
  • Ranking, autocomplete, aggregations are inefficient.
  • High-QPS text search is not distributed.

Why Elasticsearch/OpenSearch is Needed:

  • Inverted indices → sub-linear text lookup
  • Built-in typo tolerance, synonyms, stemming
  • BM25 ranking & boosting
  • Efficient geo + text + time queries
  • Autocomplete and faceted counts
  • Horizontally scalable and fault-tolerant

Soundbite for interviews: "Postgres is the source of truth, but breaks down for fuzzy search, synonyms, relevance, autocomplete, or scale. Production systems pair it with Elasticsearch/OpenSearch."

6. Enhancing User Experience

  • Synonym Expansion: Map "LA" → "Los Angeles" and expand queries accordingly.
  • Geolocation Queries: Lat/lng filtering ensures nearby cities (e.g., Pasadena) appear.
  • Similarity Search: Fuzzy matching using Elasticsearch analyzers or trigram indexes.

User Query Flow Example:

  1. User types "LA concerts."
  2. Query normalized (lowercase, punctuation removed).
  3. Synonym expansion (LA → Los Angeles).
  4. Geo coordinates retrieved.
  5. Search index queried (text + geo filter).
  6. Results ranked (relevance, time, popularity, distance).
  7. Results cached if trending.

7. Non-Functional Requirements

  • Latency: Redis caching → sub-100ms for popular searches
  • Scalability: Elasticsearch distributed indexing/sharding
  • Availability: Replicated DB + multi-AZ Elasticsearch
  • Consistency: Event Management Service is source of truth; eventual consistency to search index

8. Deeper Dive: Fuzzy Search

Scenario:

  • Dictionary: all cities (~200k–500k entries)
  • User input: "San Franciso" (typo of "San Francisco")
  • Goal: find the closest matching city efficiently, despite typos

8.1 Naive Levenshtein Distance

How it works:

  • Compare input string with every city in the dictionary
  • Compute edit distance (insertions, deletions, substitutions) for each city

Example:

  • Input: "San Franciso"
  • "San Francisco" → edit distance = 2 → match
  • "Los Angeles" → edit distance = 11 → ignored

Data storage: Simple array or list of city names (no preprocessing)

Time complexity:

  • Let n = number of cities, m = average city name length
  • Computing edit distance per city: O(m²) (see detailed explanation at end of document)
  • Total: O(n * m²) → very slow for large n

Pros: Simple to implement
Cons: Not scalable for real-time search with hundreds of thousands of entries

8.2 BK-Tree (Burkhard-Keller Tree)

Purpose: BK-Trees are designed for approximate string matching under a metric (commonly Levenshtein distance). They allow you to find all strings within a given "distance" efficiently, without scanning the entire dataset.

Structure:

  • Each node = a string (city name)
  • Each edge = edit distance between parent and child
  • If a child exists at that edge, insertion continues down that edge

Root Choice: The first inserted word becomes the root. Root choice doesn't affect correctness, but a "central" root can reduce tree depth.

Insertion Example (Detailed):
Insertion order: Los Angeles → Pasadena → Glendale → San Jose → San Mateo → San Francisco → Fresno

  1. Insert Los Angeles → root
  2. Insert Pasadena → dist(Pasadena, Los Angeles) = 8 → add under Los Angeles
  3. Insert Glendale → dist(Glendale, Los Angeles) = 8 → edge exists → recurse into Pasadena → dist(Glendale, Pasadena) = 7 → add under Pasadena
  4. Insert San Jose → dist(San Jose, Los Angeles) = 11 → add under Los Angeles
  5. Insert San Mateo → dist(San Mateo, Los Angeles) = 11 → recurse into San Jose → dist(San Mateo, San Jose) = 4 → add under San Jose
  6. Insert San Francisco → dist(San Francisco, Los Angeles) = 10 → add under Los Angeles
  7. Insert Fresno → dist(Fresno, Los Angeles) = 10 → recurse into San Francisco → dist(Fresno, San Francisco) = 9 → add under San Francisco

Final Tree:

Los Angeles──(8)──▶ Pasadena  ──(7)──▶ Glendale
              ──(10)──▶ San Francisco  ──(9)──▶ Fresno
              ──(11)──▶ San Jose       ──(4)──▶ San Mateo
Enter fullscreen mode Exit fullscreen mode

Search Logic (Fuzzy Search):
Query: "San Jsoe" (typo for San Jose), max edit distance = 2

  1. Start at Los Angeles → dist = 11 → allowed child edges [9,13]
  2. Children within range: San Francisco (10), San Jose (11)
  3. Check San Francisco → dist = 6 → prune branch
  4. Check San Jose → dist = 1 → ✅ Match found
  5. Explore San Jose's children → San Mateo → dist = 6 → prune

Key Observations:

  • Branch pruning via [d-k, d+k] makes search efficient
  • Chains form naturally when cities share distances
  • Insertion order affects tree shape but not correctness

BK-Tree Complexity:

  • Distance computation per node: O(m²)
  • Average case: O(log n) nodes visited
  • Worst case: O(n) nodes visited
  • Total search time: O(v * m²), v = nodes actually visited

Intuition: BK-Tree is like a binary tree where only relevant branches are explored, but each node comparison costs O(m²).

8.3 Trie + Levenshtein Automaton

How it works:

  • Store all city names in a Trie (prefix tree)
  • Example: "San Francisco" → S → a → n → _ → F → r → …
  • Use Levenshtein Automaton to traverse the Trie:
    • Track allowed edits at each step
    • Prune paths exceeding threshold

Time Complexity:

  • Worst-case: O(n)
  • Average case with pruning: much less → efficient for large dictionaries

Pros: Scales well, precise, dynamic typo handling
Cons: Higher implementation complexity than BK-Tree

Fuzzy Search with Trie + Levenshtein Automaton (Beginner-Friendly)

Suppose you have a list of city names:

["San Francisco", "San Jose", "San Mateo", "Los Angeles", "Fresno"]
Enter fullscreen mode Exit fullscreen mode

You want to search for "San Jsoe" (typo) and still find "San Jose".

We can do this efficiently using two things together:

  1. A Trie (prefix tree) to store city names.
  2. Levenshtein distance to measure "how different" two strings are.

1. Trie Basics

A Trie is a tree where each node is a letter. Words are formed by following a path from the root to a leaf.

Example:

Root
 |
 S
 |
 a
 |
 n
 |
(space)
 ├── F → "San Francisco"
 ├── J → "San Jose"
 └── M → "San Mateo"
Enter fullscreen mode Exit fullscreen mode
  • Every path from root → leaf = a city name.
  • We can quickly follow letters to see which words match a prefix.

2. Levenshtein Distance Basics

Levenshtein distance = minimum edits to turn one string into another

  • Edit = insert, delete, or change a letter
  • Example: "San Jsoe""San Jose" → 1 edit (swap 's' and 'o')

We allow a maximum distance (e.g., 2) so we match words even with typos.


3. How the Search Works

Instead of comparing "San Jsoe" with every city (slow), we combine Trie + Levenshtein Automaton.

  • At each Trie node, we maintain a distance row representing edits needed to match the query so far.
  • Row length = query length + 1 = 8 + 1 = 9.

Step 0: Start at empty string

Query = "San Jsoe" (8 chars)
Trie path = ""

Distance row from empty string:

[0, 1, 2, 3, 4, 5, 6, 7, 8]
Enter fullscreen mode Exit fullscreen mode
  • Index = number of query characters considered
  • Value = edits needed to turn empty string → query prefix

Step 1: Add first letter 'S' → Trie path "S"

Compute new row using formula:

new_row[j] = min(
    previous_row[j] + 1,      # deletion
    new_row[j-1] + 1,         # insertion
    previous_row[j-1] + cost  # substitution (0 if match, 1 if not)
)
Enter fullscreen mode Exit fullscreen mode

Example table:

j Query prefix Compare 'S' vs query[j-1] Cost Computation new_row[j]
0 "" 0 + 1 = 1 1
1 "S" S vs S 0 min(1+1,1+1,0+0)=0 0
2 "Sa" S vs a 1 min(2+1,0+1,1+1)=1 1
3 "San" S vs n 1 min(3+1,1+1,2+1)=2 2
4 "San " S vs (space) 1 min(4+1,2+1,3+1)=3 3
5 "San J" S vs J 1 min(5+1,3+1,4+1)=4 4
6 "San Js" S vs s 1 min(6+1,4+1,5+1)=5 5
7 "San Jso" S vs o 1 min(7+1,5+1,6+1)=6 6
8 "San Jsoe" S vs e 1 min(8+1,6+1,7+1)=7 7

Resulting row: [1, 0, 1, 2, 3, 4, 5, 6, 7]


Step 2: Add 'a' → Trie path "Sa"

Example table:

j Query prefix Compare 'a' vs query[j-1] Cost Computation new_row[j]
0 "" 1 + 1 = 2 2
1 "S" a vs S 1 min(0+1,2+1,1+1)=1 1
2 "Sa" a vs a 0 min(1+1,1+1,0+0)=0 0
3 "San" a vs n 1 min(2+1,0+1,1+1)=1 1
4 "San " a vs (space) 1 min(3+1,1+1,2+1)=2 2
5 "San J" a vs J 1 min(4+1,2+1,3+1)=3 3
6 "San Js" a vs s 1 min(5+1,3+1,4+1)=4 4
7 "San Jso" a vs o 1 min(6+1,4+1,5+1)=5 5
8 "San Jsoe" a vs e 1 min(7+1,5+1,6+1)=6 6

Resulting row: [2, 1, 0, 1, 2, 3, 4, 5, 6]


Step 3: Add 'n' → Trie path "San"

Distance row:

[3, 2, 1, 0, 1, 2, 3, 4, 5]
Enter fullscreen mode Exit fullscreen mode
  • Minimum distance = 0 → path "San" matches query "San" perfectly so far.

Step 4: Add space ' ' → Trie path "San "

Distance row:

[4, 2, 1, 1, 0, 1, 2, 3, 4]
Enter fullscreen mode Exit fullscreen mode
  • Minimum distance = 0 → "San " matches query so far

Step 5: Continue down "San J""San Jose"

  • Add 'J' → distance row: [5,4,3,2,1,0,1,2,3]

  • Add 'o'[6,5,4,3,2,1,1,2,3]

  • Add 's'[7,6,5,4,3,2,1,1,2]

  • Add 'e'[8,7,6,5,4,3,2,1,1]

  • Final distance = 1 → MATCH FOUND "San Jose"


Step 6: Early Pruning

  • "San Francisco" → distance eventually > 2 → stop exploring
  • "San Jose" → distance ≤ 2 → match
  • "San Mateo" → distance > 2 → stop

Step 7: Intuition

  1. Walk down the Trie (letter by letter).
  2. Track minimum edits to match the query at each step (distance row).
  3. Stop exploring paths where edits exceed max allowed distance.
  4. If a complete word reaches end node within max edits → success!

This method powers fuzzy search, autocomplete, and spell correction in real systems.

5. Key Advantages

  • Early Pruning: Stop exploring paths when distance exceeds threshold
  • Shared Prefixes: Multiple cities with same prefix share computation
  • Incremental Updates: Distance matrix updates are O(m) per character
  • Memory Efficient: Only store one distance matrix per search path

6. Performance Comparison

Method Time Complexity Space Implementation
Naive Levenshtein O(n × m²) O(m²) Simple
BK-Tree O(log n × m²) O(n) Medium
Trie + Levenshtein Automaton O(n × m) O(n + m) Complex

7. When to Use

  • Large dictionaries (100k+ entries)
  • Frequent fuzzy searches
  • Memory constraints (trie is more compact than BK-tree)
  • Need for prefix-based filtering

The Trie + Levenshtein Automaton approach is the most sophisticated but also the most efficient for large-scale fuzzy string matching in production systems.

8.4 Trigram Index / Elasticsearch Approach

How it works:

  1. Create Trigrams: Break each city name into 3-character overlapping sequences
    • Example: "San Francisco" → ["San", "an ", "n F", " Fr", "Fra", "ran", "anc", "nci", "cis", "isc", "sco"]
  2. Build Inverted Index: Each trigram → list of city IDs
    • Example: "Fra" → [San Francisco, Frankfurt]
  3. Querying: Input "San Franciso" → generate trigrams → retrieve all cities containing at least one trigram → compute similarity score → return top matches

Pros:

  • Scales to millions of entries
  • Handles typos, partial matches, multi-word queries
  • Produces ranked results for UX

Cons: Requires search engine infrastructure, slightly more memory overhead

9. How Elasticsearch Ranks Results: BM25 + Boosting

BM25 (Best Match 25)

Statistical ranking algorithm (improvement on TF-IDF)

Key ingredients:

  • Term Frequency (TF) → frequency of term in field, diminishing returns
  • Inverse Document Frequency (IDF) → rare terms weigh more
  • Field Length Normalization → shorter fields get higher weight

Boosting

Manually tweak ranking

  • Field Boosting: event_name^3, artist_name^2, city^1
  • Query Boosting: recency, proximity
  • Business Logic Boosting: trending events, sponsored events

Interview Takeaway: BM25 + Boosting ensures results are relevant (BM25) and aligned with business/user intent (boosting).

10. Indexing Every PostgreSQL Field: Pros & Cons

When designing a large-scale ticketing system, deciding which fields to index in PostgreSQL is crucial, especially when pairing it with a specialized search engine like Elasticsearch.

Pros of PostgreSQL Indexing:

  • Backup System: PostgreSQL can act as a reliable fallback if Elasticsearch experiences an outage, ensuring continuous service.
  • Direct Access for Specific Queries: For certain precise queries or filters (e.g., direct lookups by event_id), direct access through PostgreSQL indexes can be significantly faster.
  • Strong Consistency & Data Integrity: Indexes on crucial fields (like event_id, often as a primary key) are paramount for enforcing strong consistency and data integrity. While the guarantee of integrity comes from database constraints (e.g., PRIMARY KEY, UNIQUE) and transactional properties, indexes are essential for the efficient and scalable enforcement of these guarantees. They enable the database to quickly check for uniqueness and relationships, preventing issues like duplicate events, even when the events table contains millions of records. This is a core requirement that search engines are not designed to fulfill.
  • Simplified Read-Your-Own-Writes: Immediate query results for newly created or updated events are possible directly from PostgreSQL, providing consistent results for the user who made the change.

Cons of PostgreSQL Indexing:

  • Increased Write Overhead: More indexes generally lead to slower insert and update operations, as each index needs to be maintained.
  • Higher Storage Costs: Each index consumes additional disk space, increasing overall storage expenses.
  • Inefficient for Complex Queries: PostgreSQL's built-in text and fuzzy search capabilities are less efficient at scale compared to specialized search engines like Elasticsearch.
  • Maintenance Complexity: Managing and tuning a large number of indexes can add to operational overhead.

Recommendation for Ticketing Systems: Strategic PostgreSQL Indexing

While it's not necessary to index every field in PostgreSQL, it plays a vital role. The primary responsibility for high-volume, complex search queries is offloaded to Elasticsearch, which is horizontally scalable and optimized for such tasks.

PostgreSQL indexes serve two critical, complementary roles in this architecture:

  1. Ensuring Data Integrity and Transactional Consistency: Indexes on crucial fields like event_id (typically as a primary key) are paramount. They enable the database to efficiently enforce UNIQUE and PRIMARY KEY constraints, ensuring that event data is always correct and preventing issues like duplicate event entries. This is a core requirement that search engines cannot guarantee.

  2. Providing a Failsafe: Event creation and updates are typically low-volume operations in a ticketing system (compared to search queries). Therefore, the write overhead incurred by maintaining additional PostgreSQL indexes is acceptable. This investment ensures a robust and consistent backup of critical event data, providing a crucial safety net in the event of a search engine outage. The key here is that writes to the events table are very infrequent (as very few events are added relative to read operations), so the overhead of more indexes does not pose a significant problem in terms of performance.

Soundbite for Interview: "Mentioning PostgreSQL indexing demonstrates you've considered distributed system failure modes and have a plan B. Downsides are outweighed by reliability, consistency, and data integrity."

11. Indexing Service & Event Management Service

These services ensure search data is accurate, timely, and scalable, maintaining a strong connection between canonical event data in PostgreSQL and the search index in Elasticsearch/OpenSearch.

11.1 Event Management Service (EMS)

Purpose: EMS is the source of truth for all event metadata. Responsible for creating, updating, and managing canonical event records.

Responsibilities:

  • Manage all event-related data:
    • Event name, venue, city, state, country
    • Start and end times
    • Event capacity and ticket types
    • Synonyms for search queries (e.g., LA → Los Angeles)
  • Validate incoming event data for consistency and correctness
  • Track changes via event versioning or timestamps
  • Emit notifications for downstream systems when events are created or updated

Key Considerations for High Scale:

  • Use atomic transactions in PostgreSQL to ensure consistency
  • Implement soft deletes to prevent broken references in search index
  • Emit event change logs to message broker (Kafka, RabbitMQ) for asynchronous processing by Indexing Service
  • Ensure idempotency in updates to avoid duplicate indexing

Example Flow:

  1. Event created/updated in PostgreSQL
  2. EMS validates the record
  3. EMS publishes event notification (message queue / Kafka topic)
  4. Indexing Service consumes the message to update the search index

11.2 Indexing Service

Purpose: Keep the search index up-to-date with canonical data for fast, accurate results

Responsibilities:

  • Consume event change notifications from EMS
  • Transform data into search-friendly format:
    • Text fields → tokenized
    • Locations → geo-points
    • Dates → sortable timestamps
  • Update/delete documents in Elasticsearch/OpenSearch
  • Handle batch updates and real-time streaming
  • Maintain versioning to detect stale updates

Key Implementation Details:

  • Message Queue Consumption: Kafka, RabbitMQ, or Kinesis
  • Idempotent Updates: Each event has a unique ID
  • Bulk Indexing: For new events or large updates
  • Error Handling: Failed updates retried, dead-letter queue stores permanently failed updates

Consistency Model:

  • Elasticsearch is eventually consistent
  • EMS remains the source of truth

Example Flow:

  1. Receive "event_created" message from EMS
  2. Transform event record (normalize names, add synonyms, compute geo coordinates)
  3. Push to Elasticsearch via bulk API
  4. Confirm update or retry if it fails

11.3 Architecture & Reliability

  • Decoupling: EMS and Indexing Service communicate via message queues → prevents blocking writes in Postgres during indexing
  • Resilience: Retry mechanisms and dead-letter queues for failures
  • Scalability: Multiple consumers can scale horizontally for high volumes
  • Observability: Metrics (throughput, error rates, lag), logging full trace (event_id, timestamp, status)

11.4 Interview Takeaways

  • EMS ensures canonical, consistent event data
  • Indexing Service keeps search index current and performant
  • Decoupled, message-driven architecture → scalable & fault-tolerant
  • Idempotent + bulk operations → reliability under high load
  • Plan B for indexing failures demonstrates awareness of real-world distributed system design

12. Performance Benchmarks

Search Performance Comparison

Category Method P95 Latency QPS Memory Notes
DB-native PostgreSQL ILIKE 200ms 1,000 2GB Exact text matching
DB-native PostgreSQL Trigram 500ms 500 4GB Fuzzy text search
Search Engine Elasticsearch (Fuzzy) 50ms 10,000+ 8GB Scalable fuzzy search
Search Engine Elasticsearch (Regex) 80ms 8,000+ 10GB Regex/wildcard queries
Search Engine OpenSearch 60ms 9,000+ 8GB AWS-managed Elasticsearch
Search Engine Solr 70ms 8,500+ 9GB Enterprise search platform
Algorithm BK-Tree 20ms 5,000 16GB Specialized fuzzy matching
Algorithm Trie + Levenshtein 10ms 15,000+ 6GB Efficient fuzzy matching
Cache Redis Cached 5ms 50,000+ 4GB Hot queries only

Why Trigram Indexes Excel at Complex Queries

While Trie + Levenshtein Automaton is faster for simple fuzzy matching, Trigram indexes are superior for complex queries because they handle multiple search patterns simultaneously. Trigram indexes break text into overlapping 3-character sequences, allowing them to efficiently process:

  • Multi-word queries: "Taylor Swift concert" matches documents containing "Taylor", "Swift", and "concert" even with typos
  • Partial matches: "San Fran*" matches "San Francisco", "San Fernando", "San Francisco Bay"
  • Regex patterns: Complex regular expressions like [A-Z][a-z]+ [A-Z][a-z]+ for proper names
  • Wildcard queries: "conc*" matches "concert", "conference", "conclusion"
  • Phrase proximity: Finding documents where "Taylor" and "Swift" appear within 5 words of each other

Performance advantage: Trigram indexes pre-compute all possible 3-character combinations, making complex pattern matching O(1) lookup time instead of O(n×m²) for each query. This is why Elasticsearch uses trigrams for regex/wildcard queries despite higher memory overhead - the complexity of pattern matching makes the trade-off worthwhile.

Real-world example: A query like "LA concerts this weekend" with typos becomes much more efficient with trigram indexing because it can simultaneously match "LA" (city), "concert*" (event type), and "weekend" (time) across multiple fields and handle variations in each term.

Production Metrics

  • Elasticsearch Cluster: 3-node, 16GB RAM, 500GB index
  • Query Latency: P50: 15ms, P95: 45ms, P99: 100ms
  • Throughput: 8,000 QPS sustained, 15,000 QPS burst
  • Cache Hit Rate: 85% for trending searches

13. Caching Strategy

Multi-Layer Architecture

  1. CDN Cache (CloudFlare) - Static content, 24h TTL
  2. Application Cache (Redis) - Query results, 5-15min TTL
  3. Elasticsearch Query Cache - Search index cache

Cache Layers

  • Query Results: Popular searches cached for 15 minutes
  • Autocomplete: City/venue suggestions cached for 1 hour
  • Geo Data: Location lookups cached for 24 hours
  • Trending Searches: Top queries cached for 1 hour

Cache Invalidation

  • Event Updates: Invalidate related searches by city/venue
  • Time-based: Different TTLs based on data volatility
  • Cache Warming: Pre-populate trending searches during off-peak

14. Monitoring & Observability

Key Metrics

  • Search Performance: Latency (P50/P95/P99), success rate, error rate
  • System Health: Elasticsearch cluster status, node health, shard status
  • Business Metrics: Search volume, popular queries, user engagement
  • Cache Performance: Hit rate, memory usage, eviction rate

Alerting Thresholds

  • Critical: Search latency > 100ms, error rate > 5%, cluster health = Red
  • Warning: Search latency > 50ms, error rate > 1%, cluster health = Yellow
  • Channels: PagerDuty (critical), Slack (warnings), Email (reports)

Logging

  • Search Queries: Query, user, latency, results count, cache hit status
  • Errors: Timeout, connection failures, retry attempts
  • Performance: Elasticsearch response times, cache hit rates

15. Disaster Recovery

Failure Scenarios

  • Single Node: 0 downtime, automatic shard rebalancing
  • Cluster Failure: 15-30 min downtime, restore from snapshots
  • Data Corruption: 2-4 hours downtime, full reindex from PostgreSQL

Recovery Procedures

  • Cluster Health Check: Verify cluster status and shard allocation
  • Snapshot Restore: Restore from daily snapshots stored in S3
  • Full Rebuild: Reindex entire dataset from PostgreSQL source
  • Incremental Sync: Catch up on missed updates using timestamps

Fallback Strategies

  • PostgreSQL Fallback: Use ILIKE queries with reduced functionality
  • Read-Only Mode: Serve cached results only during outages
  • Graceful Degradation: Show maintenance message with trending events

Backup Strategy

  • Daily Snapshots: Automated daily backups to S3
  • Cross-Region Replication: Real-time replication to backup cluster
  • RTO/RPO: 15-30 min recovery time, 15 min data loss maximum

Appendix: Why Edit Distance is O(m²)

The Levenshtein distance (edit distance) has O(m²) time complexity because it uses a dynamic programming approach with a 2D table where m is the length of the strings being compared.

The Algorithm Structure

For two strings of length m each, the algorithm creates a m × m table and fills it using the following recurrence relation:

def levenshtein_distance(s1, s2):
    m, n = len(s1), len(s2)

    # Create (m+1) × (n+1) table
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    # Fill the table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1]  # No operation needed
            else:
                dp[i][j] = 1 + min(
                    dp[i-1][j],      # Deletion
                    dp[i][j-1],      # Insertion  
                    dp[i-1][j-1]     # Substitution
                )

    return dp[m][n]
Enter fullscreen mode Exit fullscreen mode

Understanding the Base Cases

The base cases initialize the first row and first column of the DP table:

# Base cases
for i in range(m + 1):
    dp[i][0] = i
for j in range(n + 1):
    dp[0][j] = j
Enter fullscreen mode Exit fullscreen mode

What these base cases represent:

  1. dp[i][0] = i (First column):

    • Converting string1[0...i-1] to an empty string
    • Requires i deletions (delete all i characters)
    • Example: "San Francisco" → "" needs 13 deletions
  2. dp[0][j] = j (First row):

    • Converting empty string to string2[0...j-1]
    • Requires j insertions (insert all j characters)
    • Example: "" → "San Franciso" needs 12 insertions

Visual representation:

     ""  S  a  n     F  r  a  n  c  i  s  o
""  [0][1][2][3][4][5][6][7][8][9][10][11][12]
S   [1][?][?][?][?][?][?][?][?][?][?][?][?]
a   [2][?][?][?][?][?][?][?][?][?][?][?][?]
n   [3][?][?][?][?][?][?][?][?][?][?][?][?]
    [4][?][?][?][?][?][?][?][?][?][?][?][?]
F   [5][?][?][?][?][?][?][?][?][?][?][?][?]
r   [6][?][?][?][?][?][?][?][?][?][?][?][?]
a   [7][?][?][?][?][?][?][?][?][?][?][?][?]
n   [8][?][?][?][?][?][?][?][?][?][?][?][?]
c   [9][?][?][?][?][?][?][?][?][?][?][?][?]
i   [10][?][?][?][?][?][?][?][?][?][?][?][?]
s   [11][?][?][?][?][?][?][?][?][?][?][?][?]
c   [12][?][?][?][?][?][?][?][?][?][?][?][?]
o   [13][?][?][?][?][?][?][?][?][?][?][?][?]
Enter fullscreen mode Exit fullscreen mode

Why these base cases make sense:

  • Row 0, Column 0: dp[0][0] = 0 (empty string to empty string = 0 operations)
  • Row 0, Column j: dp[0][j] = j (empty string to j-character string = j insertions)
  • Row i, Column 0: dp[i][0] = i (i-character string to empty string = i deletions)

These base cases provide the foundation for the dynamic programming recurrence relation that fills the rest of the table.

Why O(m²)

  1. Table Size: We create a table of size (m+1) × (n+1), which is approximately m × m when strings are similar length
  2. Nested Loops: We have two nested loops:
    • Outer loop: for i in range(1, m + 1)O(m) iterations
    • Inner loop: for j in range(1, n + 1)O(m) iterations (when n ≈ m)
  3. Total Operations: O(m) × O(m) = O(m²)

Visual Example

For strings "San Francisco" (m=13) and "San Franciso" (m=12):

     S  a  n     F  r  a  n  c  i  s  o
   [0][1][2][3][4][5][6][7][8][9][10][11][12]
S  [1][0][1][2][3][4][5][6][7][8][9][10][11]
a  [2][1][0][1][2][3][4][5][6][7][8][9][10]
n  [3][2][1][0][1][2][3][4][5][6][7][8][9]
   [4][3][2][1][0][1][2][3][4][5][6][7][8]
F  [5][4][3][2][1][0][1][2][3][4][5][6][7]
r  [6][5][4][3][2][1][0][1][2][3][4][5][6]
a  [7][6][5][4][3][2][1][0][1][2][3][4][5]
n  [8][7][6][5][4][3][2][1][0][1][2][3][4]
c  [9][8][7][6][5][4][3][2][1][0][1][2][3]
i  [10][9][8][7][6][5][4][3][2][1][0][1][2]
s  [11][10][9][8][7][6][5][4][3][2][1][0][1]
c  [12][11][10][9][8][7][6][5][4][3][2][1][0]
o  [13][12][11][10][9][8][7][6][5][4][3][2][1]
Enter fullscreen mode Exit fullscreen mode

Each cell requires constant time to compute, but we need to fill 13 × 12 = 156 cells, which is O(m²).

Why This Matters in the Ticketing System

In the context of the ticketing system:

  1. City Names: "San Francisco" vs "San Franciso" (typo)
  2. Scale Problem: With 200k-500k cities, computing edit distance for each one would be:
    • Per city: O(m²) where m ≈ 10-15 characters
    • Total: O(n × m²) = O(500,000 × 15²) = O(112.5 million operations)
  3. Performance Impact: This is why the document mentions it's "very slow for large n"

Alternative Solutions

This is exactly why the document suggests:

  • BK-Trees: O(log n) nodes visited, but each comparison still costs O(m²)
  • Trie + Levenshtein Automaton: More complex but can be optimized
  • Trigram Indexing: O(1) lookup time for approximate matches

The O(m²) complexity is inherent to the edit distance algorithm itself - it's the mathematical cost of computing the minimum number of operations needed to transform one string into another.

Next

Top comments (0)