DEV Community

Cover image for Agoda Data Engineering Interview Questions: Full Prep Guide
Gowtham Potureddi
Gowtham Potureddi

Posted on

Agoda Data Engineering Interview Questions: Full Prep Guide

Agoda data engineering interview questions skew toward high-volume marketplace telemetry: streams of searches, impressions, and bookings that must stay correct under reordering, cheap to aggregate, and safe to join against slowly changing inventory dimensions. Panels reward engineers who articulate event-time vs processing-time, idempotent transforms, and Python implementations that respect memory ceilings before debating warehouse vendors.

Python screens lean on arrays, sorting, and streaming reductions—skills that mirror latency-sensitive ranking jobs—while SQL loops still probe whether you can narrate grain, join cardinality, and PARTITION BY windows once conversations pivot to warehouse pipelines.

Dark editorial PipeCode blog header for Agoda-oriented data engineering interview prep with abstract travel-marketplace data motifs and purple-green accents.


Top topics tied to the indexed Agoda PipeCode snapshot

PipeCode currently lists four Agoda entry points worth memorizing before you widen elsewhere: the company hub, the Python lane, plus Agoda · array and Agoda · sorting. Treat everything else—especially pure SQL company hubs—as global topic lanes until sitemap listings expand.

# Indexed pillar Why interviewers care
1 Hub + Python lane Proves you can ship deterministic batch transforms that mirror online pricing jobs without blowing heap limits.
2 Array-driven reps Mirrors session buffers, rolling counters, and vectorized precomputations extracted from click streams.
3 Sorting & ordering contracts Validates whether ranking APIs stay stable when ties collide—critical when duplicate fares fight for the same slot.
4 SQL widen path Interviewers still expect additive metrics, join-safe grains, and replayable windows once you leave Python.
5 Narrow-tag study tactics Keeps difficulty honest: exhaust indexed slices, then sprint array/python + sorting/python volume before returning to aggregation SQL drills.

Agoda-flavor framing rule: say aloud ordering guarantees, duplicate handling, heap vs deque trade-offs, and SQL grain locks before optimizing constants.


1. Agoda data engineering interview snapshot & indexed PipeCode routes

Light infographic listing Agoda practice routes on PipeCode—company hub, Python lane, array topic, and sorting topic—as connected chips.

Travel-marketplace loops recruiters emphasize

Detailed explanation. Expect pairing sessions that blend storytelling about messy feeds, timed Python, and SQL sketches describing nightly booking aggregates. Hiring managers listen for ownership narratives: how you backfilled partitions after upstream delays and how you proved metric parity between streaming and batch paths.

Topic: What the sitemap-listed routes imply today

Detailed explanation. The Agoda hub anchors brand-filtered cards; Agoda · Python isolates interpreter-heavy tasks; Agoda · array and Agoda · sorting highlight the two algorithmic pillars surfaced for this brand filter today. When interviewers pivot to SQL-only prompts, route through joins/sql or window-functions/sql rather than inventing company URLs.

Metric narration before IDE time

Detailed explanation. Panels reward stating duplicate policies (“latest wins”, “earliest wins”) and timezone normalization before typing loops—mirroring how finance reconciles bookings worldwide.

Question.

List three latency-aware choices you’d mention before rewriting a hot Python reducer that merges nightly booking shards.

Input.

You inherit a reducer that loads entire shards into lists before sorting.

Code.

streaming iterators • bounded heaps • pre-sort keys in extract
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Streaming iterators avoid materializing multi-gigabyte shards when partitions spill to disk.
  2. Bounded heaps keep top-K fare diagnostics cheap without full sorts.
  3. Pre-sort keys in extract align upstream ordering with downstream joins to minimize shuffle chatter.

Output.

A spoken checklist that proves you’re thinking resource envelopes, not micro-optimizations.

Common beginner mistakes

  • Claiming extra Agoda-specific SQL hubs exist without verifying routes on explore/practice/company/agoda paths.
  • Ignoring timezone-normalized grains when comparing booking timestamps across regions.

Practice: indexed hub + lanes first

COMPANY
Agoda hub
Agoda data engineering practice

Practice →

PYTHON
Agoda · Python lane
Company-filtered Python reps

Practice →

TOPIC
Agoda · array
Array problems — Agoda slice

Practice →


2. Array patterns in Python for Agoda-tagged data screens

Diagram contrasting brute-force array scans versus sliding-window throughput on a PipeCode-styled light infographic.

Bounded windows over impression buffers

Detailed explanation. Marketplace telemetry often arrives as time-ordered arrays representing impressions or clicks per session. Interviewers expect you to articulate fixed-size sliding windows—maintaining running aggregates while the window shifts one index at a time—instead of recomputing sums with nested loops whenever prompts mention rolling counts, last-K events, or moving averages.

Say aloud what the window owns: indices [i-k+1, …, i] for length k, what leaves when i increments (left edge), and what enters (right edge). That narration proves you will not double-count rows when the same exercise shows up later as ROWS BETWEEN … frames in SQL.

Prefix sums vs incremental deltas

Detailed explanation. Prefix sums precompute cumulative totals so any contiguous range sum becomes subtraction of two prefix values—ideal when the dataset is bounded, fits in memory, and interviewers ask many random [l,r] range queries on the same array.

Incremental deltas (sliding add/remove) fit streaming or single-pass constraints where building a full prefix table feels wasteful or impossible. Rule of thumb: if the problem only ever needs windows of fixed length k sliding left-to-right, prefer O(n) sliding maintenance; if it needs arbitrary intervals offline, prefix sums often win.

Question.

When would you refuse a prefix-sum approach even though range sums are requested?

Input.

The prompt forbids a second pass and arrays arrive from an iterator with unknown length.

Code.

Streaming constraint → maintain rolling window state; prefix needs full materialization unless chunked carefully.
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Prefix arrays normally require knowing len(nums) or buffering everything first.
  2. Sliding accumulation only stores current window aggregate + boundaries.
  3. Mention numeric stability if sums involve floats—small detail senior loops appreciate.

Output.

A spoken reason: memory / pass budget, not “prefix is wrong.”

Two-pointer discipline on sorted merges

Detailed explanation. When two sorted arrays represent complementary feeds (e.g., eligible fares vs inventory snapshots), two pointers advance independently—each step compares head elements and emits the smaller without resorting combined arrays. Complexity stays Θ(n + m) rather than Θ((n+m) log(n+m)).

Call out stability: if keys tie, which side you consume first encodes business priority—mirror that later with ORDER BY tie-break columns in SQL.

Off-by-one guards interviewers listen for

Detailed explanation. Sliding-window bugs cluster on empty arrays, k = 0, k > len(nums), and inclusive vs exclusive bounds. Before coding, state tests for len(nums) == 0, k == len(nums) (single window), and negative values inside the window that flip max-average intuition.

Python Interview Question on sliding throughput

Question.

Given integers nums and integer k, return the maximum average of any contiguous subarray of length k.

Input.

Index 0 1 2 3 4
nums[i] 1 12 -5 -6 50

Let k = 4.

Solution Using prefix window accumulation

Code.

def max_average(nums, k):
    window = sum(nums[:k])
    best = window
    for i in range(k, len(nums)):
        window += nums[i] - nums[i - k]
        best = max(best, window)
    return best / k
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace

(Input table above; k = 4.)

  1. Initial window covers indices 0..3 → sum = 1 + 12 + (-5) + (-6) = 2.
  2. Slide by subtracting nums[0] and adding nums[4] → window sum = 2 - 1 + 50 = 51.
  3. Track max raw sum 51; divide by k → average 12.75.

Output.

Metric Value
max average 12.75

Why this works — concept by concept:

  • Sliding invariant — each step adjusts the window by one removal and one insertion only.
  • Linear scan — single left-to-right pass after initialization Θ(n).
  • Cost — memory Θ(1) beyond the input array.

Common beginner mistakes

  • Recomputing full slice sums inside nested loops—easy to regress to quadratic work in n and k.
  • Forgetting to guard empty inputs or k > len(nums) during storytelling.
  • Treating prefix sums as free when the prompt implies streaming or unknown length.
  • Mis-stating window bounds (off-by-one) under time pressure—always echo inclusive indices aloud.

PYTHON
Topic — array · Python
Array Python lane

Practice →

PYTHON
Topic — array · medium
Array medium lane

Practice →


3. Sorting, ordering keys, and complexity narratives in interviews

Infographic linking Python tuple sorting with stable comparator keys to SQL ORDER BY tie-break columns on PipeCode styling.

Comparator tuples mirroring SQL ORDER BY

Detailed explanation. Python’s sorted and list.sort accept key= functions that return sort keys. Returning a tuple creates lexicographic ordering: compare first tuple elements; on ties, compare second elements, and so on—exactly how ORDER BY col_a, col_b, col_c behaves when keys are comparable.

In interviews, spell the parallel: ORDER BY promo_flag DESC, price ASC, viewed_at ASC corresponds to a Python key like (-promo_flag, price, viewed_at) when using ascending tuple ordering with a negated first column.

When mergesort behavior matters

Detailed explanation. Python’s sorted is stable: rows that compare equal on the entire key keep their original relative order. That matters when ties represent UI ties or batch arrivals and stakeholders expect yesterday’s ordering preserved within equal fares.

Contrast with heaps or heapsorts—often unstable—when an interviewer asks for top-K without caring about tie preservation.

Complexity storytelling under duplicates

Detailed explanation. General comparison sorting costs Θ(n log n) time. Heavy duplicates do not magically reduce that bound unless you exploit structure—e.g., counting sort when keys are small integers, radix buckets for fixed-width codes, or bucketizing prices into discrete tiers.

If n is warehouse-scale, describe external sort or shuffle-heavy Spark sort rather than in-memory sorted.

Descending columns without reversing the whole table

Detailed explanation. SQL uses DESC per column; Python often uses Unary negation on numeric columns (-price) or reverse=True on single-key sorts. For mixed directions across tuple keys, prefer per-field inversion so lexicographic tuple ordering stays predictable—avoid sorting twice unless the prompt demands it.

Translating tie-break rules into grain guarantees

Detailed explanation. When Python ordering decides which duplicate booking survives, your SQL dedupe window must use the same ORDER BY keys—otherwise nightly pipelines disagree with offline notebooks. Say “deterministic tie-break” aloud and list surrogate columns (ingest_ts, event_id) even if the toy dataset omits them.

Python Interview Question on multi-key ordering

Question.

Given rows (promo_flag, price, viewed_at) as tuples, return indices sorted so promo_flag descending, then price ascending, then viewed_at ascending.

Input.

idx promo_flag price viewed_at
0 1 200 10
1 1 150 12
2 0 150 9

Solution Using key tuple inversion

Code.

rows = [
    (1, 200, 10),
    (1, 150, 12),
    (0, 150, 9),
]
order = sorted(range(len(rows)), key=lambda i: (-rows[i][0], rows[i][1], rows[i][2]))
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace

(Input table above.)

  1. Negate promo_flag inside keys so 1 sorts ahead of 0.
  2. Sort ascending on price when promo ties—(1,150,12) beats (1,200,10).
  3. Final tie on viewed_at orders identical promo/price pairs chronologically.

Output.

Sorted indices
1
0
2

Why this works — concept by concept:

  • Lexicographic tuples encode multi-column ordering compactly.
  • Sign inversion replaces unavailable DESC syntax inside Python keys.
  • Cost — sorting indices Θ(n log n) comparisons.

Common beginner mistakes

  • Sorting mutated copies without proving immutability of tie-break fields.
  • Omitting timezone normalization when translating this pattern to SQL timestamps.
  • Assuming duplicates sort themselves safely without naming stable-sort expectations or surrogate tie-breakers.
  • Mixing reverse=True with multi-key tuples until directionality becomes unreadable—prefer per-column negation for clarity.

PYTHON
Topic — sorting · Python
Sorting Python lane

Practice →

COMPANY
Agoda · sorting topic
Sorting — Agoda slice

Practice →


4. SQL pipelines & dimensional grain when loops leave Python

Three-lane infographic for global SQL practice topics—aggregation, joins, and window functions—with PipeCode brand colors.

Why loops yield to set-based SQL in pipeline interviews

Detailed explanation. Python excels at prototypes, unit tests, and bounded arrays; warehouse SQL excels at declaring grain, joining dimensions at scale, and letting the optimizer choose plans. Interviewers shift languages when they want to hear integrity constraints—uniqueness, foreign keys, additive closures—not clever imperative loops.

Your bridge narrative: “I prototyped dedupe logic in Python to validate semantics; production expresses the same contract as ROW_NUMBER/QUALIFY so cardinality stays auditable.”

Grain sentences before SELECT clauses

Detailed explanation. A grain sentence locks semantics before syntax: “One row equals one booking-night for one property on one calendar date in reporting currency.” Attach business keys (booking_id, stay_date) and surrogate keys (property_sk) so downstream JOIN and GROUP BY arguments inherit that contract.

Without grain, SUM(nightly_rate_usd) might multiply nights incorrectly after accidental fan-out.

Facts vs dimensions in interview shorthand

Detailed explanation. Facts hold measurements you aggregate (rates, counts, margin). Dimensions hold descriptive context (region, chain, room type) used for filters and roll-ups. When sketching, draw facts central with dimensions orbiting—then narrate which dimensions conform across facts so reconciliation joins stay legal.

Additive metrics vs conditional aggregates

Detailed explanation. Additive metrics survive SUM across grains (booking revenue). Semi-additive metrics need constraints (inventory snapshots: sum across stores only after picking one time slice). Say which bucket your metric lives in before defending GROUP BY.

Declaring many-to-one joins before SUM

Detailed explanation. State which side is unique: “Every fact booking_sk maps to at most one current dimension row.” If uniqueness fails, JOIN multiplies rows and SUM inflates—your spoken guard is dedupe first, aggregate first, or semi-join filters before aggregation depending on prompt geometry.

Join cardinality after Python reshaping

Detailed explanation. pandas.merge, list comprehensions, and nested dict merges can silently duplicate keys—especially after explode, cross join sketches, or typo’d merge(on=). In SQL, mirror safeguards:

  • ROW_NUMBER / QUALIFY when keep earliest/latest rules apply.
  • DISTINCT only when duplicates are truly interchangeable—dangerous if measures diverge.
  • GROUP BY grain locked to the same keys you asserted aloud.

Explain why each rewrite preserves row counts relative to the grain sentence.

GROUP BY and HAVING as aggregation fences

Detailed explanation. GROUP BY collapses rows after joins execute—exactly where fan-out hurts. Place filters that remove duplicates or narrow keys before aggregation when possible; use HAVING for predicates on aggregated results (SUM(x) > 100).

Relate to Python: groupby after merge has the same ordering sensitivity—SQL simply makes the fan-out visible in COUNT(*) checks.

Window frames vs grouped aggregates

Detailed explanation. GROUP BY removes detail rows; windows (SUM(...) OVER (PARTITION BY … ORDER BY …)) keep rows while attaching running metrics. Choose windows when the interviewer mentions rank, dedupe, running totals, or replay-safe ordering—tie back to §2’s sliding-window intuition.

SQL Interview Question on deduped nightly bookings

Question.

Table stg_bookings(booking_sk, ingest_ts, nightly_rate_usd) contains duplicate extracts per booking. Keep the earliest ingest per booking_sk and return booking_sk, nightly_rate_usd.

Input.

booking_sk ingest_ts nightly_rate_usd
10 2026-05-01 00:01 120
10 2026-05-01 00:05 120
11 2026-05-01 00:02 95

Solution Using ROW_NUMBER dedupe

Code.

WITH ranked AS (
    SELECT booking_sk,
           nightly_rate_usd,
           ROW_NUMBER() OVER (
               PARTITION BY booking_sk
               ORDER BY ingest_ts ASC
           ) AS rn
    FROM stg_bookings
)
SELECT booking_sk,
       nightly_rate_usd
FROM ranked
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace

(Input table above.)

  1. PARTITION BY booking_sk scopes dedupe per reservation surrogate.
  2. ORDER BY ingest_ts promotes earliest warehouse arrival.
  3. Filter rn = 1 yields surviving rows per partition.

Output.

booking_sk nightly_rate_usd
10 120
11 95

Why this works — concept by concept:

  • Deterministic ordering — earliest timestamp wins ties consistently.
  • Cardinality restoration — output grain returns to one row per booking.
  • Cost — sort per partition Θ(n log n) worst-case.

Common beginner mistakes

  • Using DISTINCT without clarifying which column survives when rates diverge.
  • Dropping ingest_ts from ordering keys when duplicates share timestamps—add surrogate ids when available.
  • Claiming GROUP BY fixes fan-out when duplicates originate before the aggregation grain—you must shrink cardinality earlier or prove joins stay many-to-one.
  • Confusing window dedupe with GROUP BY collapse—windows keep detail rows; grouped aggregates often hide them.

SQL
Topic — joins · SQL
Joins lane

Practice →

SQL
Topic — window functions · SQL
Window SQL lane

Practice →


5. Study plan when Agoda company filters stay narrow

Weekly cadence tied to indexed slices

Detailed explanation. Monday: Agoda hub breadth. Tuesday: Agoda · Python depth. Wednesday: Agoda · array sprint. Thursday: Agoda · sorting sprint. Friday: global aggregation/sql + joins/sql resets so SQL instincts stay warm.

Telemetry journaling for behavioral rounds

Detailed explanation. Keep a one-page postmortem template: symptom, blast radius, rollback, metric parity checks—consulting loops love structured ownership stories.

Practice: widen SQL after Python reps

SQL
Topic — aggregation · SQL
Aggregation SQL lane

Practice →

COMPANY
Agoda hub
Return to company breadth

Practice →


Tips to crack Agoda data engineering interviews

Tip Why it lands
Anchor every answer with grain + ordering Mirrors nightly booking reconciliations finance audits.
Pair Python windows with SQL dedupe windows Shows you can traverse stack boundaries without inventing new semantics.
Narrate indexed routes honestly Signals maturity—cite hub, python, array, sorting before speculative URLs.
Log latency envelopes aloud Travel workloads punish quadratic surprises—state budgets early.

Frequently asked questions

Do Agoda interviews skip Python if I only list SQL on my resume?

Unlikely for marketplace pipelines—expect array/sorting drills alongside SQL storytelling. Keep Agoda · Python warm even if your title skews warehouse-only.

Why does PipeCode list array and sorting slices separately?

They expose distinct algorithm contracts: contiguous structure manipulation vs ordering semantics. Alternating days keeps both instincts fresh without collapsing practice into generic “coding.”

How do I practice SQL if no Agoda-only SQL hub appears in the sitemap?

Use global lanes such as joins/sql and window-functions/sql—the interview loop still expects additive metrics and replay-safe windows.

What’s the fastest way to narrate duplicate booking feeds?

State latest vs earliest wins, cite ROW_NUMBER or QUALIFY, and mention timezone normalization before typing—panels reward clarity over cleverness.

Should I mention travel-domain trivia to impress hiring managers?

Only when it sharpens metric definitions (cross-border taxes, multi-night stays). Avoid tourism anecdotes that skip data contracts.

Where can I drill Agoda-tagged problems next?

Start at Agoda hub, branch into Python, array, and sorting, then widen SQL via aggregation/sql reps.

Start practicing Agoda data engineering problems

PipeCode pairs company-filtered marketplace reps with feedback loops so you graduate from reading solutions to shipping your own bounded implementations.

Pipecode.ai is Leetcode for Data Engineering

Browse Agoda practice →

Agoda Python lane →

Top comments (0)