Bloomberg data engineering interview questions sit at the intersection of three narrow, production-grade patterns: Python two-pointer and string manipulation that reverses words in a sentence using s.split() plus [::-1] plus ' '.join(...), production-quality OOP and abstract classes that subclass an ABC base with @abstractmethod load, transform, and write to stream a chunked CSV into line-delimited JSON without loading the whole file into memory, and SQL window functions for time-series and overlap analysis with SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals and EXISTS subqueries with interval crosschecks for subscription-overlap detection. The schema you reason over feels like Bloomberg's own product (ticks, subscriptions, provider_feeds, corporate_actions), and the bar is fluency with two-pointer index arithmetic, from abc import ABC, abstractmethod, and PARTITION BY ... ORDER BY ... under tie-break and overlap edge cases.
This guide walks through the four topic clusters Bloomberg actually tests, each with a detailed topic explanation, per-sub-topic explanation with a worked example and its solution, and an interview-style problem with a full solution that explains why it works. The mix matches the curated 2-problem Bloomberg set (1 easy, 1 hard) plus one supplementary SQL section anchored on patterns Bloomberg's external interview reports surface heavily — a Python-and-SQL loop where dictation of the invariant out loud is half the score and the other half is typing production-quality code on the first try. Strong data engineer interview questions prep at Bloomberg is less about contest difficulty (the pass rate sits at 8% across recent samples) and more about clean class boundaries, deterministic ordering, streaming-friendly file I/O, and the kind of code review you would survive on day three of the job.
Top Bloomberg data engineering interview topics
From the Bloomberg data engineering practice set, the four numbered sections below follow this topic map (one row per H2):
| # | Topic (sections 1–4) | Why it shows up at Bloomberg |
|---|---|---|
| 1 | The Bloomberg data engineering interview process | Four-stage funnel — phone screen (45-60 min) → online assessment (some roles) → onsite of 3-5 rounds → final / fit. ~30 days end to end. |
| 2 | Python two-pointer and string manipulation | Reverse Words in String (EASY) — s.split() + [::-1] + ' '.join(...), or two-pointer in-place reversal of a character array. |
| 3 | Abstract classes, OOP, and file I/O in Python | Chunked CSV to Line-Delimited JSON Processor (HARD) — subclass an abstract RecordProcessor, implement load, transform, write, and stream csv.DictReader rows into json.dumps(...) + '\n' without loading the whole file. |
| 4 | SQL window functions for time-series and overlap | Rolling-sum aggregates and Subscription Overlap — SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals, plus EXISTS interval crosscheck for overlap detection. |
Bloomberg-flavor framing rule: Bloomberg's prompts model the company's own product — market-data ticks, subscription feeds, provider data with corporate actions, regulatory-grade entitlements. The interviewer is grading whether you map each business framing to the right primitive: reverse-and-rejoin →
split+[::-1]+join; processor with pluggable transforms →ABCwith@abstractmethod; rolling aggregates →SUM OVER (PARTITION BY symbol ORDER BY ts); interval overlap →EXISTSwith<and>bound checks. State the mapping out loud, then type production-quality code.
1. The Bloomberg Data Engineering Interview Process
The Bloomberg DE interview funnel from phone screen to final round
The Bloomberg data engineer interview process is a four-stage funnel that takes about thirty days end to end: a 45-60 minute technical phone screen (longer than the 20-30 minute non-engineering equivalent), an optional online assessment for some roles, a virtual onsite of three to five rounds covering coding plus system design plus behavioral plus fit, and a final decision call. Recent samples report an 8% pass rate across thirteen Bloomberg DE candidates — making it the most selective DE loop of any company in this prep series.
The two stages most candidates misread are the technical phone screen (40 minutes is usually enough only for one strong problem plus follow-ups, so move fast on the boilerplate) and the behavioral round (interviewers drill deep into past project technical details, not just culture-fit small talk).
Phone screen — 45 to 60 minutes of coding fluency
Detailed explanation. Bloomberg's engineering phone screen is conducted in a CoderPad-style live editor and runs 45-60 minutes for technical roles. The format converges on one medium-difficulty algorithmic problem (trees, graphs, two-pointer, sliding window, dynamic programming) plus follow-ups on edge cases and complexity. Bloomberg specifically values production-quality code — type hints, edge-case guards, and a few inline tests — over a clever-but-fragile one-liner. State your assumptions before typing: input bounds, null/empty handling, in-place vs return-new.
Worked example. When the interviewer hands you "reverse the order of words in a sentence", a clean opener is "I'll assume the input is a non-null string with words separated by single spaces, no leading or trailing whitespace; if the prompt allows multiple spaces I'll collapse them with split() since split() with no args treats any run of whitespace as one delimiter." That single sentence earns the round.
Rule of thumb: state assumptions, then type. Two minutes of clarification is worth twenty minutes of rework.
Online assessment and the take-home variant
Detailed explanation. Some Bloomberg DE roles route through an online assessment (Byteboard-style or HackerRank-style) instead of, or in addition to, the live phone screen. Format varies by team: a 60-minute timed assessment with two SQL questions and one Python question, or a 90-minute take-home with a small data-pipeline implementation graded on code quality and testability. The signal is the same as the live screen — production-quality code, edge-case awareness, deterministic output.
Worked example. A take-home variant might be: "Read events.csv, normalize the timestamps to UTC, deduplicate on (user_id, event_id) keeping the latest, and write the result as line-delimited JSON to events.ndjson." Strong candidates ship a typed function with a __main__ guard, an inline unit test, and a README.md with assumptions — even when the prompt does not explicitly ask for them.
Rule of thumb: on take-homes, ship a typed function plus one unit test plus a one-paragraph README of assumptions. Bloomberg grades on production-readiness, not just correctness.
Onsite — coding, design, behavioral, fit (3-5 rounds)
Detailed explanation. The onsite is a virtual loop of three to five rounds totaling four to five hours: a coding round (harder than the screen, often involving sliding-window aggregators or class-hierarchy refactors), a system design round (real-time market-data pipelines, dedup with sequence numbers, schema evolution, entitlements at read time), a behavioral round, and a fit round. Bloomberg's distinctive design moves include asking about exactly-once semantics under sink failures, late-arriving data with watermarks, and schema evolution across pipeline stages without breaking downstream consumers.
Worked example. "Design a pipeline that ingests trades from twenty venues, dedupes, enriches with reference data, and serves to downstream analytics consumers with a freshness SLA of 30 seconds." The clean answer routes through Kafka with idempotent producers, a Flink stateful operator keyed on (venue_id, sequence_number) for dedup, a side-input join against a slowly-changing reference table, and a CloudWatch alert on consumer lag. Naming idempotency, watermarks, and the SLA up front is what differentiates strong candidates.
Rule of thumb: every Bloomberg system-design answer should mention idempotency once, watermarks once, and the SLA / SLO once. Those three phrases earn the round.
Behavioral round — STAR plus deep-dive into past project tech
Detailed explanation. Bloomberg's behavioral round is conventional STAR — Situation, Task, Action, Result — but interviewers go technical-deep on the projects you mention. Bring two real projects you owned end-to-end with crisp numbers: rows per day, peak QPS, latency p99, cost per query, post-incident retro. Generic "I led a migration" stories without numbers will not pass. Recent first-person reports (Taro, October 2024) explicitly note: "The interviewer asked past project-related questions and went into depth about the technical aspects related to those projects."
Worked example. When asked "tell me about a time you handled a data corruption incident," a strong answer names the source (a downstream consumer reported NaN totals on the daily revenue dashboard), the diagnosis (a corrupt parquet file from a vendor partition that passed the schema check but had a sentinel -1 instead of NULL in a numeric column), the fix (added a Great Expectations contract test on the column's value distribution, plus a postmortem doc shared with the vendor), and the result (zero similar incidents over the next six months, two related vendors signed up for the contract).
Common beginner mistakes
- Treating the phone screen as a sprint — Bloomberg grades production-quality code, not the cleverest hack.
- Skipping
README.mdand inline tests on take-homes. - Naming idempotency without explaining the sink — "my Kafka producer is idempotent" is not a complete answer; the sink decides exactly-once.
- Bringing one STAR story to the behavioral round (interviewers double-click and the story runs out fast).
- Generic past-project answers without numbers (rows / day, p99, cost).
Practice: drill the Bloomberg DE panel before the live screen
COMPANY
Bloomberg — all DE problems
Bloomberg data engineering practice set
PYTHON
Bloomberg — Python only
Bloomberg Python practice
2. Python Two-Pointer and String Manipulation
Two-pointer iteration, string reversal, and split-rejoin in Python for data engineering
The first canonical Bloomberg Python pattern is string-level manipulation with two-pointer or split-rejoin primitives. The headline interview problem on the Bloomberg practice set, Reverse Words in String, is the textbook split-rejoin question: reverse the order of words in a sentence using only built-in string methods. The canonical answer is a single line — ' '.join(s.split()[::-1]) — but the interviewer is reading whether you can articulate the three primitives behind it (split, [::-1], join) and reach for the right tool when the constraints flip (immutable input, no built-ins, in-place character array).
Pro tip: Python's
s.split()(with no argument) is not the same ass.split(' '). The no-arg form collapses runs of whitespace and ignores leading or trailing whitespace; the explicit' 'form preserves empty strings between consecutive spaces. For "reverse words" prompts you almost always want the no-arg form because it handles' hello world 'cleanly.
Splitting on whitespace — collapsing multiple spaces
The split invariant: s.split() with no argument splits on any run of whitespace and discards empty leading or trailing tokens; s.split(' ') with an explicit single-space delimiter preserves every empty token between consecutive spaces.
-
'a b c'.split()→['a', 'b', 'c']. -
' a b '.split()→['a', 'b'](leading and trailing whitespace dropped, runs collapsed). -
' a b '.split(' ')→['', '', 'a', '', 'b', '', ''](every run preserved).
Worked example. Three quick splits on the same input.
| input | call | output |
|---|---|---|
' hello world ' |
.split() |
['hello', 'world'] |
' hello world ' |
.split(' ') |
['', '', 'hello', '', 'world', '', ''] |
'hello world' |
.split(' ') |
['hello', 'world'] |
Worked-example solution.
def words_collapsed(s: str) -> list[str]:
return s.split()
Rule of thumb: default to s.split() (no arg) for natural-language input. Reach for s.split(' ') only when the prompt explicitly says preserve empty tokens.
Reversing a list with [::-1] vs reversed()
The reversal invariant: [::-1] is a slice that returns a new reversed list; reversed(seq) returns a lazy iterator; list.reverse() mutates in place and returns None.
-
words[::-1]— new list, original unchanged.O(n)time,O(n)extra memory. -
list(reversed(words))— new list via iterator, same cost. -
words.reverse()— in-place mutation, returnsNone.O(n)time,O(1)extra memory.
Worked example. Three ways to reverse ['a', 'b', 'c'].
| call | result | mutates original |
|---|---|---|
words[::-1] |
['c', 'b', 'a'] |
no |
list(reversed(words)) |
['c', 'b', 'a'] |
no |
words.reverse() (then read words) |
['c', 'b', 'a'] |
yes |
Worked-example solution.
def reverse_new(words: list[str]) -> list[str]:
return words[::-1]
def reverse_in_place(words: list[str]) -> None:
words.reverse()
Rule of thumb: [::-1] for "return a new reversed copy"; list.reverse() for "minimize memory and the prompt allows mutation."
Two-pointer in-place reversal of a character array
The two-pointer invariant: swap arr[left] with arr[right], then move left += 1 and right -= 1, until left >= right. This runs in O(n) time and O(1) extra memory and is the canonical follow-up when the interviewer says "now do it without [::-1] or reversed()".
-
Initialization —
left = 0,right = len(arr) - 1. -
Swap —
arr[left], arr[right] = arr[right], arr[left](Python's tuple assignment makes this one line). -
Termination —
left < rightkeeps you safe at the middle;left == rightis a no-op on odd-length arrays.
Worked example. Reverse ['a', 'b', 'c', 'd'] in place.
| step | left | right | array |
|---|---|---|---|
| start | 0 | 3 | ['a','b','c','d'] |
| swap, advance | 1 | 2 | ['d','b','c','a'] |
| swap, advance | 2 | 1 | ['d','c','b','a'] |
stop (left >= right) |
— | — | ['d','c','b','a'] |
Worked-example solution.
def reverse_in_place_chars(arr: list[str]) -> None:
left, right = 0, len(arr) - 1
while left < right:
arr[left], arr[right] = arr[right], arr[left]
left += 1
right -= 1
Rule of thumb: the left < right condition (strictly less than) is the safe termination — left <= right would over-swap on even-length arrays.
Common beginner mistakes
- Using
s.split(' ')on real-world input and getting empty-string tokens that confuse downstream logic. - Calling
list.reverse()and then trying to use the return value (it returnsNone). - Two-pointer with
left <= right— over-swaps on even-length arrays, restoring the original order. - Reaching for
re.splitwhenstr.splitis enough (no-libraries phone-screen variants rejectre).
PipeCode's Python for data engineering interviews course drills these primitives across forty-plus problems, including the two-pointer and string-manipulation variants Bloomberg's screen reaches for.
Python Interview Question on Reverse Words in String
Write a function reverse_words(s: str) -> str that returns s with the order of words reversed. Words are separated by one or more spaces; treat any run of whitespace as a single delimiter. Leading and trailing whitespace in the input must not appear in the output, and the output must have exactly one space between words.
Solution Using split-reverse-join
def reverse_words(s: str) -> str:
return ' '.join(s.split()[::-1])
Step-by-step trace (input s = ' the quick brown fox '):
| step | expression | value |
|---|---|---|
| 1 | s.split() |
['the', 'quick', 'brown', 'fox'] |
| 2 | s.split()[::-1] |
['fox', 'brown', 'quick', 'the'] |
| 3 | ' '.join(s.split()[::-1]) |
'fox brown quick the' |
-
s.split()with no argument — splits on any run of whitespace, discards empty leading/trailing tokens. The two leading spaces, the three betweenquickandbrown, and the two trailing spaces all collapse cleanly. -
[::-1]slice — returns a new list with the words in reverse order; the originalsis unchanged. -
' '.join(...)— concatenates the reversed list into a single string with exactly one space between adjacent elements; returns'fox brown quick the'. - Return — the function returns the joined string directly; no intermediate variable is needed.
Output:
| input | result |
|---|---|
'the quick brown fox' |
'fox brown quick the' |
' hello world ' |
'world hello' |
'one' |
'one' |
'' |
'' (empty input → empty output) |
' ' |
'' (whitespace-only input → empty output) |
Why this works — concept by concept:
-
No-arg split collapses whitespace —
s.split()(no argument) splits on any run of whitespace and silently drops empty leading or trailing tokens; this single call handles the "multiple spaces" and "leading / trailing whitespace" edge cases without any conditional logic. -
Slice reversal returns a new list —
[::-1]builds a new list inO(n)time without mutating the source; the function stays referentially transparent. -
Join with a single space —
' '.join(...)concatenates with exactly one delimiter between elements and zero delimiters at the ends; the output cannot have leading or trailing spaces by construction. -
Empty input is naturally handled —
''.split()returns[],[][::-1]returns[], and' '.join([])returns''; the empty-input case requires no special branch. -
Three primitives, one line — chaining
split, slice, andjoinis the canonical Pythonic answer; reaching forre.splitor manual character iteration is over-engineering. -
Cost —
O(n)time wherenis the input length, plusO(n)extra memory for the new list and joined string;nis constrained by the line length in any real-world prompt.
PYTHON
Bloomberg — two pointers / string
Reverse Words in String (Bloomberg)
PYTHON
Topic — two pointers
Two-pointer Python problems
3. Abstract Classes, OOP, and File I/O in Python
ABC, abstract methods, and chunked file processing in Python for data engineering
The second canonical Bloomberg Python pattern is production-quality OOP with abstract base classes and streaming file I/O. The headline interview problem on the Bloomberg practice set, Chunked CSV to Line-Delimited JSON Processor, is exactly this shape: subclass an abstract RecordProcessor base, implement the load, transform, and write abstract methods, and stream a CSV file row-by-row into a line-delimited JSON output without loading the whole file into memory. This pattern shows up in roughly half of recent Bloomberg DE coding rounds — confirmed by external interview reports flagging "abstract DataProcessor framework" and "RotatingFileSink class hierarchy" prompts.
Pro tip: Python's
abcmodule enforces the abstract contract at instantiation time, not at class definition time.class Foo(ABC): @abstractmethod def bar(self): ...lets you defineFoowithout implementingbar, butFoo()raisesTypeError: Can't instantiate abstract class Foo with abstract method bar. Subclasses that miss any abstract method fail the same way. This is the language feature Bloomberg interviewers are reading you for.
from abc import ABC, abstractmethod — defining the contract
The abstract-class invariant: a class that inherits from ABC and decorates one or more methods with @abstractmethod cannot be instantiated directly; only subclasses that implement every abstract method can be instantiated.
-
from abc import ABC, abstractmethod— the standard import.ABCis a helper base class withABCMetaas its metaclass. -
@abstractmethod— decorator that marks a method as abstract. Place it on the method directly inside the class body. -
Subclassing —
class Concrete(Abstract):inherits all concrete methods and must override every@abstractmethod.
Worked example. Defining an abstract RecordProcessor with three abstract methods.
| method | purpose | signature |
|---|---|---|
load(path) |
read the input | returns an iterable of records |
transform(record) |
reshape one record | returns the transformed record |
write(record, out) |
write one record | returns None (side effect) |
Worked-example solution.
from abc import ABC, abstractmethod
from typing import Iterable, IO
class RecordProcessor(ABC):
@abstractmethod
def load(self, path: str) -> Iterable[dict]: ...
@abstractmethod
def transform(self, record: dict) -> dict: ...
@abstractmethod
def write(self, record: dict, out: IO[str]) -> None: ...
Rule of thumb: keep the abstract base small — three to five methods is the sweet spot. Abstract bases with twenty methods are a code smell.
Subclassing and implementing every abstract method
The subclass invariant: a concrete subclass must implement every method decorated with @abstractmethod in the parent (and every grandparent, transitively); missing even one keeps the subclass abstract.
-
class CsvToLdjsonProcessor(RecordProcessor):— declare inheritance. -
Implement each abstract —
load,transform,writeall need concrete bodies (notpassor...). - Concrete-only methods inherit unchanged — non-abstract methods on the parent are usable directly.
Worked example. A CsvToLdjsonProcessor that reads CSV and writes line-delimited JSON.
| abstract method | concrete implementation |
|---|---|
load |
csv.DictReader(open(path)) |
transform |
identity (return the row dict unchanged) |
write |
out.write(json.dumps(record) + '\n') |
Worked-example solution.
import csv
import json
from typing import Iterable, IO
class CsvToLdjsonProcessor(RecordProcessor):
def load(self, path: str) -> Iterable[dict]:
with open(path, newline='') as f:
yield from csv.DictReader(f)
def transform(self, record: dict) -> dict:
return record
def write(self, record: dict, out: IO[str]) -> None:
out.write(json.dumps(record) + '\n')
Rule of thumb: if your transform is the identity, leave it as the identity — interviewers reward not over-engineering as much as they reward implementing correctly.
Chunked CSV reading with csv.DictReader and line-delimited JSON output
The streaming invariant: csv.DictReader is a lazy iterator; iterating it yields one row at a time without loading the whole file. Pair it with a for loop that writes each transformed row to the output stream, and the program runs in O(1) extra memory regardless of input size.
-
csv.DictReader(f)— yields onedictper CSV row, using the first row as header keys. -
yield from— theloadmethod delegates iteration toDictReaderwithout buffering. -
for record in self.load(path):— the orchestrator iterates lazily;transformandwriterun per row. -
Line-delimited JSON —
json.dumps(record) + '\n'per row produces a file where each line is independently parseable.
Worked example. Streaming a 10-million-row CSV through the processor uses constant memory, not 10M-row memory.
| stage | memory profile |
|---|---|
load |
one row at a time |
transform |
one row at a time |
write |
one row at a time |
| total extra memory | O(1) |
Worked-example solution.
def run(processor: RecordProcessor, in_path: str, out_path: str) -> int:
count = 0
with open(out_path, 'w') as out:
for record in processor.load(in_path):
transformed = processor.transform(record)
processor.write(transformed, out)
count += 1
return count
Rule of thumb: never for record in list(processor.load(path)) — wrapping a generator in list() materializes the whole file and defeats the streaming guarantee.
Common beginner mistakes
- Using
pandas.read_csv(path)instead ofcsv.DictReader— loads the whole file, defeats the streaming invariant. - Forgetting
newline=''onopen()— Python's csv module needs that to handle quoted multi-line fields correctly. - Implementing
loadto return alistinstead of yielding — same issue as thepandasmistake, scaled. - Writing JSON without the trailing
'\n'— produces concatenated objects on one line, not line-delimited. - Skipping the
withblock on the output file — the file may not flush on early termination.
Python Interview Question on Chunked CSV to LDJSON Processor
Implement a class hierarchy that reads a CSV file row-by-row and writes the rows as line-delimited JSON to an output file, without loading the whole CSV into memory. The base class RecordProcessor must be abstract with three @abstractmethod methods: load(path) (returns an iterable of dict), transform(record) (returns a dict), and write(record, out) (writes one record and returns None). Implement a concrete subclass CsvToLdjsonProcessor that satisfies the contract. Provide a top-level run(processor, in_path, out_path) function that orchestrates the streaming pipeline and returns the row count.
Solution Using a Concrete Subclass of an Abstract RecordProcessor
from __future__ import annotations
import csv
import json
from abc import ABC, abstractmethod
from typing import IO, Iterable
class RecordProcessor(ABC):
@abstractmethod
def load(self, path: str) -> Iterable[dict]: ...
@abstractmethod
def transform(self, record: dict) -> dict: ...
@abstractmethod
def write(self, record: dict, out: IO[str]) -> None: ...
class CsvToLdjsonProcessor(RecordProcessor):
def load(self, path: str) -> Iterable[dict]:
with open(path, newline='') as f:
yield from csv.DictReader(f)
def transform(self, record: dict) -> dict:
return record
def write(self, record: dict, out: IO[str]) -> None:
out.write(json.dumps(record) + '\n')
def run(processor: RecordProcessor, in_path: str, out_path: str) -> int:
count = 0
with open(out_path, 'w') as out:
for record in processor.load(in_path):
transformed = processor.transform(record)
processor.write(transformed, out)
count += 1
return count
Step-by-step trace (input events.csv with 3 rows):
name,age,city
alice,30,NYC
bob,25,SF
carol,40,LON
| step | action | output side effect |
|---|---|---|
| 1 | processor.load('events.csv') |
yields {'name':'alice','age':'30','city':'NYC'}
|
| 2 | transform(record) |
returns the same dict (identity) |
| 3 | write(record, out) |
writes {"name":"alice","age":"30","city":"NYC"}\n
|
| 4 | next iteration | yields {'name':'bob','age':'25','city':'SF'}
|
| 5 | transform + write | writes {"name":"bob","age":"25","city":"SF"}\n
|
| 6 | next iteration | yields {'name':'carol','age':'40','city':'LON'}
|
| 7 | transform + write | writes {"name":"carol","age":"40","city":"LON"}\n
|
| 8 | iterator exhausted |
count = 3 returned |
-
Open the output file inside
with— guarantees flush and close even on exceptions. -
processor.load(in_path)— returns a generator; iterating it lazily reads one row at a time from the CSV. -
processor.transform(record)— applies the per-record transformation (identity here). -
processor.write(transformed, out)— writes one JSON line per record. -
Increment
count— after each successful write. -
Return
count— at end of iteration.
Output:
{"name": "alice", "age": "30", "city": "NYC"}
{"name": "bob", "age": "25", "city": "SF"}
{"name": "carol", "age": "40", "city": "LON"}
Plus return value 3.
Why this works — concept by concept:
-
ABC + @abstractmethod enforces the contract —
RecordProcessorcannot be instantiated directly; any subclass missingload,transform, orwriteraisesTypeErrorat construction. The interface is enforced by the language, not by convention. -
csv.DictReader yields lazily — it is a generator that reads one row at a time; pairing it with
yield fromkeeps the entire pipeline streaming. -
Generator-based load —
loadreturns an iterable, not alist; the caller iterates withforand never materializes the file. -
Line-delimited JSON —
json.dumps(record) + '\n'per row produces a file where each line is independently parseable, which is the canonical format for log streaming and incremental ingestion. - with-block on the output — the output handle flushes and closes deterministically; an exception mid-stream does not leave a partial file open.
-
Cost —
O(1)extra memory regardless of input size (the generator holds at most one row),O(n)time wherenis the input row count; noO(n)memory spike at any stage.
PYTHON
Bloomberg — OOP / abstract
Chunked CSV to LDJSON Processor (Bloomberg)
PYTHON
Topic — OOP
OOP Python problems
4. SQL Window Functions for Time-Series and Subscription Overlap
PARTITION BY, LAG, and rolling aggregates in SQL for time-series and overlap analysis
The third canonical Bloomberg interview pattern is SQL window functions on time-series data plus interval-overlap detection. Bloomberg's external interview reports converge here — a recent DataLemur compilation lists eight Bloomberg SQL questions covering rolling aggregates, monthly per-product averages, and ranking-tie semantics, and Interview Query flags Subscription Overlap as a flagship Bloomberg SQL question. The canonical primitives are SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals, LAG/LEAD for adjacent-row comparisons, and EXISTS subqueries with interval crosschecks for overlap detection. PipeCode does not yet host a Bloomberg-tagged SQL problem, so this section is anchored on the topic-level SQL window functions practice page.
Pro tip: window functions evaluate after
WHEREand beforeORDER BYin the SQL pipeline. That is why you cannot filterWHERE rolling_total > 1000in the sameSELECTas the window — wrap the window in a CTE first, then filter in the outer query. This trips up most candidates on the first onsite SQL prompt.
SUM / AVG OVER (PARTITION BY ... ORDER BY ...) for rolling totals
The rolling-aggregate invariant: SUM(metric) OVER (PARTITION BY entity ORDER BY ts) returns a per-row running total scoped to each entity, ordered by time, including all rows up to and including the current row by default.
-
PARTITION BY symbol— resets the running aggregate at each symbol boundary. -
ORDER BY trade_date— defines the time direction of the cumulation. -
Default frame —
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Override withROWS BETWEEN N PRECEDING AND CURRENT ROWfor a fixed-window rolling sum.
Worked example. A ticks(symbol, trade_date, volume) table.
| symbol | trade_date | volume | running_volume |
|---|---|---|---|
| AAPL | 2026-04-01 | 100 | 100 |
| AAPL | 2026-04-02 | 150 | 250 |
| AAPL | 2026-04-03 | 200 | 450 |
| MSFT | 2026-04-01 | 80 | 80 |
| MSFT | 2026-04-02 | 120 | 200 |
Worked-example solution.
SELECT symbol, trade_date, volume,
SUM(volume) OVER (
PARTITION BY symbol
ORDER BY trade_date
) AS running_volume
FROM ticks;
Rule of thumb: if the prompt says "running" / "cumulative" / "year-to-date" / "month-to-date" per entity, reach for SUM OVER (PARTITION BY entity ORDER BY ts) before anything else.
LAG / LEAD for adjacent-row comparison
The lag-lead invariant: LAG(col, n) OVER (PARTITION BY entity ORDER BY ts) returns the value of col from n rows earlier within the same partition, ordered by ts. LEAD is the symmetric "n rows later" variant.
-
LAG(volume) OVER (...)— previous row's volume, orNULLfor the first row in the partition. -
LAG(volume, 2)— two rows back. -
LAG(volume, 1, 0)— default0instead ofNULLfor missing rows.
Worked example. Day-over-day volume change per symbol.
| symbol | trade_date | volume | prev_volume | dod_change |
|---|---|---|---|---|
| AAPL | 2026-04-01 | 100 | NULL | NULL |
| AAPL | 2026-04-02 | 150 | 100 | 50 |
| AAPL | 2026-04-03 | 200 | 150 | 50 |
Worked-example solution.
SELECT symbol, trade_date, volume,
LAG(volume) OVER (PARTITION BY symbol ORDER BY trade_date) AS prev_volume,
volume - LAG(volume) OVER (PARTITION BY symbol ORDER BY trade_date) AS dod_change
FROM ticks;
Rule of thumb: day-over-day deltas, week-over-week resets, and "did this row break a streak?" all map to LAG. Use LEAD for "is the next event within X minutes?".
Subscription overlap via interval crosscheck (EXISTS subquery)
The overlap invariant: two intervals (start_a, end_a) and (start_b, end_b) overlap if and only if start_a <= end_b and start_b <= end_a. This is the canonical interval-overlap formula; everything else (partial overlap, full containment, identical intervals) collapses to this two-condition check.
-
EXISTSsubquery — for each row, check if any other row from the same table satisfies the overlap condition. -
s1.user_id <> s2.user_id— exclude the row from comparing with itself. -
End-date
IS NOT NULL— only completed subscriptions count for overlap detection in Bloomberg's framing.
Worked example. A subscriptions(user_id, start_date, end_date) table with four users.
| user_id | start_date | end_date | overlaps? |
|---|---|---|---|
| 1 | 2026-01-01 | 2026-01-31 | ✓ (with user 2 and user 3) |
| 2 | 2026-01-15 | 2026-01-17 | ✓ (with user 1) |
| 3 | 2026-01-29 | 2026-02-04 | ✓ (with user 1) |
| 4 | 2026-02-05 | 2026-02-10 | ✗ |
Worked-example solution.
SELECT s1.user_id,
CASE WHEN EXISTS (
SELECT 1 FROM subscriptions s2
WHERE s2.user_id <> s1.user_id
AND s2.end_date IS NOT NULL
AND s1.start_date <= s2.end_date
AND s2.start_date <= s1.end_date
) THEN 1 ELSE 0 END AS overlap
FROM subscriptions s1
WHERE s1.end_date IS NOT NULL;
Rule of thumb: the overlap formula is start_a <= end_b AND start_b <= end_a. Memorize it cold — every Bloomberg interval-detection prompt collapses to it.
Common beginner mistakes
- Filtering
WHERE running_volume > 1000in the sameSELECTas the window — windows evaluate afterWHERE; wrap in a CTE first. - Forgetting
PARTITION BYand getting one global running total across all symbols. - Using
LAG(volume)without anORDER BYinside the window — undefined ordering, undefined result. - Writing the overlap check as
start_a < end_b AND start_b < end_a(strict) when the prompt allows touching boundaries — off-by-one on identical-day overlaps. - Forgetting
s1.user_id <> s2.user_id— every row "overlaps with itself" and the answer is always1.
SQL Interview Question on Detecting Subscription Overlap
Given a table subscriptions(user_id INT, start_date DATE, end_date DATE) where end_date IS NULL indicates an active subscription, return one row per user with overlap = 1 if the user's completed subscription window overlaps any other user's completed subscription, and overlap = 0 otherwise. Consider only rows where end_date IS NOT NULL. Use the canonical interval-overlap formula.
Solution Using EXISTS with Interval Crosscheck
SELECT
s1.user_id,
CASE WHEN EXISTS (
SELECT 1
FROM subscriptions s2
WHERE s2.user_id <> s1.user_id
AND s2.end_date IS NOT NULL
AND s1.start_date <= s2.end_date
AND s2.start_date <= s1.end_date
) THEN 1 ELSE 0 END AS overlap
FROM subscriptions s1
WHERE s1.end_date IS NOT NULL
ORDER BY s1.user_id;
Step-by-step trace (input subscriptions):
| user_id | start_date | end_date |
|---|---|---|
| 1 | 2026-01-01 | 2026-01-31 |
| 2 | 2026-01-15 | 2026-01-17 |
| 3 | 2026-01-29 | 2026-02-04 |
| 4 | 2026-02-05 | 2026-02-10 |
-
Filter to completed subscriptions —
WHERE s1.end_date IS NOT NULLkeeps all four rows here (every row has anend_date). -
For user 1 (
2026-01-01to2026-01-31) — checkEXISTS. Compare against user 2 (2026-01-15to2026-01-17):2026-01-01 <= 2026-01-17✓ AND2026-01-15 <= 2026-01-31✓ → overlap.EXISTSreturns true,overlap = 1. -
For user 2 (
2026-01-15to2026-01-17) — compare against user 1:2026-01-15 <= 2026-01-31✓ AND2026-01-01 <= 2026-01-17✓ → overlap.overlap = 1. -
For user 3 (
2026-01-29to2026-02-04) — compare against user 1:2026-01-29 <= 2026-01-31✓ AND2026-01-01 <= 2026-02-04✓ → overlap.overlap = 1. -
For user 4 (
2026-02-05to2026-02-10) — compare against every other user. User 1:2026-02-05 <= 2026-01-31✗ (Feb 5 is after Jan 31). User 2:2026-02-05 <= 2026-01-17✗. User 3:2026-02-05 <= 2026-02-04✗. No overlap.overlap = 0. -
ORDER BY s1.user_idfor stability.
Output:
| user_id | overlap |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 0 |
Why this works — concept by concept:
-
Interval overlap formula — two intervals
(a_start, a_end)and(b_start, b_end)overlap iffa_start <= b_end AND b_start <= a_end; this two-condition check captures partial overlap, full containment, and shared boundaries in one expression. - EXISTS short-circuits — the subquery returns true the moment one matching row is found; for users with many overlapping peers, the engine does not evaluate every pair.
-
s1.user_id <> s2.user_idexcludes self-overlap — without this guard, every row trivially overlaps itself and the answer is uniformly1. -
end_date IS NOT NULLfilters incomplete rows — Bloomberg's framing only counts completed subscriptions; active subscriptions (open intervals) are excluded by the prompt. -
CASE WHEN EXISTS ... THEN 1 ELSE 0— produces a0/1flag column directly from the boolean predicate, no separate aggregation needed. -
Cost —
O(n²)worst case (every row compared against every other) without an index; with a B-tree index on(start_date, end_date)plus a stop-early plan, the average case drops toO(n log n)for typical interval distributions.
SQL
Topic — window functions
Window function problems (all companies)
SQL
Topic — intervals
Interval / overlap problems
Tips to crack Bloomberg data engineering interviews
These are habits that move the needle in real Bloomberg DE loops — not a re-statement of the topics above.
Practice with Bloomberg's data shape
Bloomberg's interview prompts model market data and subscription feeds: ticks, quotes, trades, subscriptions, provider_feeds, corporate_actions. Drilling on order-line ecommerce schemas wastes prep time. Stick to event-shaped tables with a per-symbol-per-day grain, and pull problems from the window-functions topic page for shapes that match.
Master from abc import ABC, abstractmethod cold
Half of Bloomberg DE coding rounds reach for an abstract base class — confirmed by PipeCode #508 plus two of four recent external Bloomberg DE problems (DataProcessor, RotatingFileSink). Type the boilerplate from memory until it is muscle memory: class Base(ABC): @abstractmethod def method(self): ... plus a concrete subclass with every abstract method implemented. Drill on the OOP topic page.
Type your Python with from __future__ import annotations
Bloomberg grades production-quality code, and that includes type hints. Add from __future__ import annotations at the top, use list[dict] and Iterable[dict] rather than List[Dict], and annotate every function signature. Interviewers notice — and the type-checker catches the off-by-one before the interviewer does.
Drill window functions
PARTITION BY plus ORDER BY plus optional LAG/LEAD plus the ROWS BETWEEN N PRECEDING AND CURRENT ROW frame override is the Bloomberg SQL toolkit. Practice with SQL window-function problems until the syntax is reflex. PipeCode's SQL for data engineering interviews course drills these primitives across forty-plus problems.
Bring two STAR stories per behavioral round
Bloomberg's behavioral round goes technical-deep — interviewers drill past project details, not just culture-fit. Pick two real projects with crisp numbers (rows per day, p99, cost per query, post-incident retro) and rehearse the deep-dive Q&A. Generic teamwork stories will not pass.
Where to practice on PipeCode
| Skill lane | Practice path |
|---|---|
| Curated Bloomberg practice set | /explore/practice/company/bloomberg |
| Bloomberg Python practice | /explore/practice/company/bloomberg/python |
| Two-pointer + string in Python | /explore/practice/topic/two-pointers/python |
| String-manipulation in Python | /explore/practice/topic/string-manipulation/python |
| OOP + abstract classes | /explore/practice/topic/abstract-classes |
| File I/O + CSV parsing | /explore/practice/topic/file-io/python |
| SQL window functions | /explore/practice/topic/window-functions/sql |
| Intervals / overlap | /explore/practice/topic/intervals |
| All practice topics | /explore/practice/topics |
| Interview courses | /explore/courses |
Communication under time pressure
State assumptions before typing: "I'll assume the input CSV has a header row, no embedded newlines in fields, and one consistent delimiter." State grain: "One JSON object per CSV row, in source order." State edge cases: "If a row has missing keys, my transform returns the partial dict and write serializes whatever keys are present." Interviewers grade clear reasoning above silent-and-perfect.
Frequently asked questions
What is the Bloomberg data engineering interview process?
The Bloomberg data engineer interview process is a four-stage funnel: a 45-60 minute technical phone screen in CoderPad covering a medium-difficulty algorithmic problem with follow-ups, an optional online assessment for some roles (Byteboard / HackerRank format with SQL plus Python), a virtual onsite of three to five rounds covering coding plus system design plus behavioral plus fit, and a final decision call. Total elapsed time is typically about thirty days. The recent reported pass rate is 8% across thirteen samples — making it the most selective DE loop tracked in our blog series.
What programming languages does Bloomberg test for data engineering?
Bloomberg's data engineering interviews lean on Python and SQL, with Java or C++ appearing for some platform / infra roles. Python is production-quality — from abc import ABC, abstractmethod, csv.DictReader, type hints, generator-based streaming. SQL is at LeetCode-medium / DataLemur grade with a strong tilt toward window functions, rolling aggregates, and interval-overlap detection. PipeCode's Bloomberg Python practice anchors the Python side; the SQL flavor is covered by topic-level pages.
What SQL topics show up most in Bloomberg data engineering interviews?
The topics are narrow and consistent: window functions (SUM / AVG OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals), LAG/LEAD for adjacent-row comparisons (day-over-day change, streak detection), HAVING + COUNT for power-user filters, self-joins for pairwise comparisons, and EXISTS subqueries with interval crosschecks for overlap detection. PipeCode's window-function problems and interval problems cover these directly.
How difficult are Bloomberg data engineering interview questions?
Bloomberg data engineering interview questions are calibrated above generic LeetCode-medium for the algorithm half — closer to a production-code bar than a contest bar. The two-problem PipeCode set splits 1 EASY (Reverse Words in String) for warm-ups and 1 HARD (Chunked CSV to LDJSON Processor) for production-quality OOP. Reports describe the panel as friendly but rigorous; the 8% pass rate reflects code-review-grade judgment on edge cases, type hints, and streaming-friendly file I/O — not contest depth.
How should I prepare for a Bloomberg data engineering interview?
Solve the 2-problem Bloomberg practice set end to end — that maps the exact Python pattern coverage. Then back-fill: 10+ two-pointer / string-manipulation problems for the EASY phone-screen flavor, 10+ OOP / abstract-class problems for the HARD onsite Python round, and 20+ SQL window-function and interval-overlap problems for the SQL round. Add Bloomberg-specific behavioral prep — two real STAR stories with crisp numbers — and one read-through of the streaming stack (Kafka, Flink, Airflow, Parquet).
Does Bloomberg test object-oriented Python with abstract classes?
Yes — heavily. Half of recent Bloomberg DE coding rounds use an abstract base class (from abc import ABC, abstractmethod) and ask candidates to implement a concrete subclass. PipeCode's Chunked CSV to LDJSON Processor is the canonical PipeCode-Bloomberg problem in this family, and external interview reports confirm the pattern repeats with DataProcessor and RotatingFileSink framings. Master the ABC boilerplate and the streaming-generator pattern and this round becomes mechanical.
Start practicing Bloomberg data engineering problems
Reading patterns is not the same as typing them under time pressure. PipeCode pairs company-tagged Bloomberg problems with tests, AI feedback, and a coding environment so you can drill the exact Python OOP, two-pointer, and SQL window-function patterns Bloomberg asks — without the noise of generic algorithm prep.
Pipecode.ai is Leetcode for Data Engineering.
Browse Bloomberg practice →
Python for DE interviews course →




Top comments (0)