Gowtham Potureddi

Posted on May 2

Bloomberg Data Engineering Interview Questions: Full DE Prep Guide

#python #sql #interview #dataengineering

Bloomberg data engineering interview questions sit at the intersection of three narrow, production-grade patterns: Python two-pointer and string manipulation that reverses words in a sentence using s.split() plus [::-1] plus ' '.join(...), production-quality OOP and abstract classes that subclass an ABC base with @abstractmethod load, transform, and write to stream a chunked CSV into line-delimited JSON without loading the whole file into memory, and SQL window functions for time-series and overlap analysis with SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals and EXISTS subqueries with interval crosschecks for subscription-overlap detection. The schema you reason over feels like Bloomberg's own product (ticks, subscriptions, provider_feeds, corporate_actions), and the bar is fluency with two-pointer index arithmetic, from abc import ABC, abstractmethod, and PARTITION BY ... ORDER BY ... under tie-break and overlap edge cases.

This guide walks through the four topic clusters Bloomberg actually tests, each with a detailed topic explanation, per-sub-topic explanation with a worked example and its solution, and an interview-style problem with a full solution that explains why it works. The mix matches the curated 2-problem Bloomberg set (1 easy, 1 hard) plus one supplementary SQL section anchored on patterns Bloomberg's external interview reports surface heavily — a Python-and-SQL loop where dictation of the invariant out loud is half the score and the other half is typing production-quality code on the first try. Strong data engineer interview questions prep at Bloomberg is less about contest difficulty (the pass rate sits at 8% across recent samples) and more about clean class boundaries, deterministic ordering, streaming-friendly file I/O, and the kind of code review you would survive on day three of the job.

#	Topic (sections 1–4)	Why it shows up at Bloomberg
1	The Bloomberg data engineering interview process	Four-stage funnel — phone screen (45-60 min) → online assessment (some roles) → onsite of 3-5 rounds → final / fit. ~30 days end to end.
2	Python two-pointer and string manipulation	Reverse Words in String (EASY) — `s.split()` + `[::-1]` + `' '.join(...)`, or two-pointer in-place reversal of a character array.
3	Abstract classes, OOP, and file I/O in Python	Chunked CSV to Line-Delimited JSON Processor (HARD) — subclass an abstract `RecordProcessor`, implement `load`, `transform`, `write`, and stream `csv.DictReader` rows into `json.dumps(...) + '\n'` without loading the whole file.
4	SQL window functions for time-series and overlap	Rolling-sum aggregates and Subscription Overlap — `SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date)` for rolling totals, plus `EXISTS` interval crosscheck for overlap detection.

1. The Bloomberg Data Engineering Interview Process

The Bloomberg DE interview funnel from phone screen to final round

The Bloomberg data engineer interview process is a four-stage funnel that takes about thirty days end to end: a 45-60 minute technical phone screen (longer than the 20-30 minute non-engineering equivalent), an optional online assessment for some roles, a virtual onsite of three to five rounds covering coding plus system design plus behavioral plus fit, and a final decision call. Recent samples report an 8% pass rate across thirteen Bloomberg DE candidates — making it the most selective DE loop of any company in this prep series.

The two stages most candidates misread are the technical phone screen (40 minutes is usually enough only for one strong problem plus follow-ups, so move fast on the boilerplate) and the behavioral round (interviewers drill deep into past project technical details, not just culture-fit small talk).

Phone screen — 45 to 60 minutes of coding fluency

Detailed explanation. Bloomberg's engineering phone screen is conducted in a CoderPad-style live editor and runs 45-60 minutes for technical roles. The format converges on one medium-difficulty algorithmic problem (trees, graphs, two-pointer, sliding window, dynamic programming) plus follow-ups on edge cases and complexity. Bloomberg specifically values production-quality code — type hints, edge-case guards, and a few inline tests — over a clever-but-fragile one-liner. State your assumptions before typing: input bounds, null/empty handling, in-place vs return-new.

Worked example. When the interviewer hands you "reverse the order of words in a sentence", a clean opener is "I'll assume the input is a non-null string with words separated by single spaces, no leading or trailing whitespace; if the prompt allows multiple spaces I'll collapse them with split() since split() with no args treats any run of whitespace as one delimiter." That single sentence earns the round.

Rule of thumb: state assumptions, then type. Two minutes of clarification is worth twenty minutes of rework.

Online assessment and the take-home variant

Detailed explanation. Some Bloomberg DE roles route through an online assessment (Byteboard-style or HackerRank-style) instead of, or in addition to, the live phone screen. Format varies by team: a 60-minute timed assessment with two SQL questions and one Python question, or a 90-minute take-home with a small data-pipeline implementation graded on code quality and testability. The signal is the same as the live screen — production-quality code, edge-case awareness, deterministic output.

Worked example. A take-home variant might be: "Read events.csv, normalize the timestamps to UTC, deduplicate on (user_id, event_id) keeping the latest, and write the result as line-delimited JSON to events.ndjson." Strong candidates ship a typed function with a __main__ guard, an inline unit test, and a README.md with assumptions — even when the prompt does not explicitly ask for them.

Rule of thumb: on take-homes, ship a typed function plus one unit test plus a one-paragraph README of assumptions. Bloomberg grades on production-readiness, not just correctness.

Onsite — coding, design, behavioral, fit (3-5 rounds)

Detailed explanation. The onsite is a virtual loop of three to five rounds totaling four to five hours: a coding round (harder than the screen, often involving sliding-window aggregators or class-hierarchy refactors), a system design round (real-time market-data pipelines, dedup with sequence numbers, schema evolution, entitlements at read time), a behavioral round, and a fit round. Bloomberg's distinctive design moves include asking about exactly-once semantics under sink failures, late-arriving data with watermarks, and schema evolution across pipeline stages without breaking downstream consumers.

Worked example. "Design a pipeline that ingests trades from twenty venues, dedupes, enriches with reference data, and serves to downstream analytics consumers with a freshness SLA of 30 seconds." The clean answer routes through Kafka with idempotent producers, a Flink stateful operator keyed on (venue_id, sequence_number) for dedup, a side-input join against a slowly-changing reference table, and a CloudWatch alert on consumer lag. Naming idempotency, watermarks, and the SLA up front is what differentiates strong candidates.

Rule of thumb: every Bloomberg system-design answer should mention idempotency once, watermarks once, and the SLA / SLO once. Those three phrases earn the round.

Behavioral round — STAR plus deep-dive into past project tech

Detailed explanation. Bloomberg's behavioral round is conventional STAR — Situation, Task, Action, Result — but interviewers go technical-deep on the projects you mention. Bring two real projects you owned end-to-end with crisp numbers: rows per day, peak QPS, latency p99, cost per query, post-incident retro. Generic "I led a migration" stories without numbers will not pass. Recent first-person reports (Taro, October 2024) explicitly note: "The interviewer asked past project-related questions and went into depth about the technical aspects related to those projects."

Worked example. When asked "tell me about a time you handled a data corruption incident," a strong answer names the source (a downstream consumer reported NaN totals on the daily revenue dashboard), the diagnosis (a corrupt parquet file from a vendor partition that passed the schema check but had a sentinel -1 instead of NULL in a numeric column), the fix (added a Great Expectations contract test on the column's value distribution, plus a postmortem doc shared with the vendor), and the result (zero similar incidents over the next six months, two related vendors signed up for the contract).

Common beginner mistakes

Treating the phone screen as a sprint — Bloomberg grades production-quality code, not the cleverest hack.
Skipping README.md and inline tests on take-homes.
Naming idempotency without explaining the sink — "my Kafka producer is idempotent" is not a complete answer; the sink decides exactly-once.
Bringing one STAR story to the behavioral round (interviewers double-click and the story runs out fast).
Generic past-project answers without numbers (rows / day, p99, cost).

Practice: drill the Bloomberg DE panel before the live screen

COMPANY
Bloomberg — all DE problems
Bloomberg data engineering practice set

Practice →

PYTHON
Bloomberg — Python only
Bloomberg Python practice

Practice →

2. Python Two-Pointer and String Manipulation

Two-pointer iteration, string reversal, and split-rejoin in Python for data engineering

The first canonical Bloomberg Python pattern is string-level manipulation with two-pointer or split-rejoin primitives. The headline interview problem on the Bloomberg practice set, Reverse Words in String, is the textbook split-rejoin question: reverse the order of words in a sentence using only built-in string methods. The canonical answer is a single line — ' '.join(s.split()[::-1]) — but the interviewer is reading whether you can articulate the three primitives behind it (split, [::-1], join) and reach for the right tool when the constraints flip (immutable input, no built-ins, in-place character array).

Pro tip: Python's s.split() (with no argument) is not the same as s.split(' '). The no-arg form collapses runs of whitespace and ignores leading or trailing whitespace; the explicit ' ' form preserves empty strings between consecutive spaces. For "reverse words" prompts you almost always want the no-arg form because it handles ' hello world ' cleanly.

Splitting on whitespace — collapsing multiple spaces

The split invariant: s.split() with no argument splits on any run of whitespace and discards empty leading or trailing tokens; s.split(' ') with an explicit single-space delimiter preserves every empty token between consecutive spaces.

'a b c'.split() → ['a', 'b', 'c'].
' a b '.split() → ['a', 'b'] (leading and trailing whitespace dropped, runs collapsed).
' a b '.split(' ') → ['', '', 'a', '', 'b', '', ''] (every run preserved).

Worked example. Three quick splits on the same input.

input	call	output
`' hello world '`	`.split()`	`['hello', 'world']`
`' hello world '`	`.split(' ')`	`['', '', 'hello', '', 'world', '', '']`
`'hello world'`	`.split(' ')`	`['hello', 'world']`

Worked-example solution.

def words_collapsed(s: str) -> list[str]:
    return s.split()

Rule of thumb: default to s.split() (no arg) for natural-language input. Reach for s.split(' ') only when the prompt explicitly says preserve empty tokens.

Reversing a list with `[::-1]` vs `reversed()`

The reversal invariant: [::-1] is a slice that returns a new reversed list; reversed(seq) returns a lazy iterator; list.reverse() mutates in place and returns None.

words[::-1] — new list, original unchanged. O(n) time, O(n) extra memory.
list(reversed(words)) — new list via iterator, same cost.
words.reverse() — in-place mutation, returns None. O(n) time, O(1) extra memory.

Worked example. Three ways to reverse ['a', 'b', 'c'].

call	result	mutates original
`words[::-1]`	`['c', 'b', 'a']`	no
`list(reversed(words))`	`['c', 'b', 'a']`	no
`words.reverse()` (then read `words`)	`['c', 'b', 'a']`	yes

Worked-example solution.

def reverse_new(words: list[str]) -> list[str]:
    return words[::-1]

def reverse_in_place(words: list[str]) -> None:
    words.reverse()

Rule of thumb: [::-1] for "return a new reversed copy"; list.reverse() for "minimize memory and the prompt allows mutation."

Two-pointer in-place reversal of a character array

The two-pointer invariant: swap arr[left] with arr[right], then move left += 1 and right -= 1, until left >= right. This runs in O(n) time and O(1) extra memory and is the canonical follow-up when the interviewer says "now do it without [::-1] or reversed()".

Initialization — left = 0, right = len(arr) - 1.
Swap — arr[left], arr[right] = arr[right], arr[left] (Python's tuple assignment makes this one line).
Termination — left < right keeps you safe at the middle; left == right is a no-op on odd-length arrays.

Worked example. Reverse ['a', 'b', 'c', 'd'] in place.

step	left	right	array
start	0	3	`['a','b','c','d']`
swap, advance	1	2	`['d','b','c','a']`
swap, advance	2	1	`['d','c','b','a']`
stop (`left >= right`)	—	—	`['d','c','b','a']`

Worked-example solution.

def reverse_in_place_chars(arr: list[str]) -> None:
    left, right = 0, len(arr) - 1
    while left < right:
        arr[left], arr[right] = arr[right], arr[left]
        left += 1
        right -= 1

Rule of thumb: the left < right condition (strictly less than) is the safe termination — left <= right would over-swap on even-length arrays.

Common beginner mistakes

Using s.split(' ') on real-world input and getting empty-string tokens that confuse downstream logic.
Calling list.reverse() and then trying to use the return value (it returns None).
Two-pointer with left <= right — over-swaps on even-length arrays, restoring the original order.
Reaching for re.split when str.split is enough (no-libraries phone-screen variants reject re).

PipeCode's Python for data engineering interviews course drills these primitives across forty-plus problems, including the two-pointer and string-manipulation variants Bloomberg's screen reaches for.

Python Interview Question on Reverse Words in String

Write a function reverse_words(s: str) -> str that returns s with the order of words reversed. Words are separated by one or more spaces; treat any run of whitespace as a single delimiter. Leading and trailing whitespace in the input must not appear in the output, and the output must have exactly one space between words.

Solution Using split-reverse-join

def reverse_words(s: str) -> str:
    return ' '.join(s.split()[::-1])

Step-by-step trace (input s = ' the quick brown fox '):

step	expression	value
1	`s.split()`	`['the', 'quick', 'brown', 'fox']`
2	`s.split()[::-1]`	`['fox', 'brown', 'quick', 'the']`
3	`' '.join(s.split()[::-1])`	`'fox brown quick the'`

s.split() with no argument — splits on any run of whitespace, discards empty leading/trailing tokens. The two leading spaces, the three between quick and brown, and the two trailing spaces all collapse cleanly.
[::-1] slice — returns a new list with the words in reverse order; the original s is unchanged.
' '.join(...) — concatenates the reversed list into a single string with exactly one space between adjacent elements; returns 'fox brown quick the'.
Return — the function returns the joined string directly; no intermediate variable is needed.

Output:

input	result
`'the quick brown fox'`	`'fox brown quick the'`
`' hello world '`	`'world hello'`
`'one'`	`'one'`
`''`	`''` (empty input → empty output)
`' '`	`''` (whitespace-only input → empty output)

Why this works — concept by concept:

No-arg split collapses whitespace — s.split() (no argument) splits on any run of whitespace and silently drops empty leading or trailing tokens; this single call handles the "multiple spaces" and "leading / trailing whitespace" edge cases without any conditional logic.
Slice reversal returns a new list — [::-1] builds a new list in O(n) time without mutating the source; the function stays referentially transparent.
Join with a single space — ' '.join(...) concatenates with exactly one delimiter between elements and zero delimiters at the ends; the output cannot have leading or trailing spaces by construction.
Empty input is naturally handled — ''.split() returns [], [][::-1] returns [], and ' '.join([]) returns ''; the empty-input case requires no special branch.
Three primitives, one line — chaining split, slice, and join is the canonical Pythonic answer; reaching for re.split or manual character iteration is over-engineering.
Cost — O(n) time where n is the input length, plus O(n) extra memory for the new list and joined string; n is constrained by the line length in any real-world prompt.

PYTHON
Bloomberg — two pointers / string
Reverse Words in String (Bloomberg)

Practice →

PYTHON
Topic — two pointers
Two-pointer Python problems

Practice →

3. Abstract Classes, OOP, and File I/O in Python

ABC, abstract methods, and chunked file processing in Python for data engineering

The second canonical Bloomberg Python pattern is production-quality OOP with abstract base classes and streaming file I/O. The headline interview problem on the Bloomberg practice set, Chunked CSV to Line-Delimited JSON Processor, is exactly this shape: subclass an abstract RecordProcessor base, implement the load, transform, and write abstract methods, and stream a CSV file row-by-row into a line-delimited JSON output without loading the whole file into memory. This pattern shows up in roughly half of recent Bloomberg DE coding rounds — confirmed by external interview reports flagging "abstract DataProcessor framework" and "RotatingFileSink class hierarchy" prompts.

Pro tip: Python's abc module enforces the abstract contract at instantiation time, not at class definition time. class Foo(ABC): @abstractmethod def bar(self): ... lets you define Foo without implementing bar, but Foo() raises TypeError: Can't instantiate abstract class Foo with abstract method bar. Subclasses that miss any abstract method fail the same way. This is the language feature Bloomberg interviewers are reading you for.

`from abc import ABC, abstractmethod` — defining the contract

The abstract-class invariant: a class that inherits from ABC and decorates one or more methods with @abstractmethod cannot be instantiated directly; only subclasses that implement every abstract method can be instantiated.

from abc import ABC, abstractmethod — the standard import. ABC is a helper base class with ABCMeta as its metaclass.
@abstractmethod — decorator that marks a method as abstract. Place it on the method directly inside the class body.
Subclassing — class Concrete(Abstract): inherits all concrete methods and must override every @abstractmethod.

Worked example. Defining an abstract RecordProcessor with three abstract methods.

method	purpose	signature
`load(path)`	read the input	returns an iterable of records
`transform(record)`	reshape one record	returns the transformed record
`write(record, out)`	write one record	returns `None` (side effect)

Worked-example solution.

from abc import ABC, abstractmethod
from typing import Iterable, IO

class RecordProcessor(ABC):
    @abstractmethod
    def load(self, path: str) -> Iterable[dict]: ...

    @abstractmethod
    def transform(self, record: dict) -> dict: ...

    @abstractmethod
    def write(self, record: dict, out: IO[str]) -> None: ...

Rule of thumb: keep the abstract base small — three to five methods is the sweet spot. Abstract bases with twenty methods are a code smell.

Subclassing and implementing every abstract method

The subclass invariant: a concrete subclass must implement every method decorated with @abstractmethod in the parent (and every grandparent, transitively); missing even one keeps the subclass abstract.

class CsvToLdjsonProcessor(RecordProcessor): — declare inheritance.
Implement each abstract — load, transform, write all need concrete bodies (not pass or ...).
Concrete-only methods inherit unchanged — non-abstract methods on the parent are usable directly.

Worked example. A CsvToLdjsonProcessor that reads CSV and writes line-delimited JSON.

abstract method	concrete implementation
`load`	`csv.DictReader(open(path))`
`transform`	identity (return the row dict unchanged)
`write`	`out.write(json.dumps(record) + '\n')`

Worked-example solution.

import csv
import json
from typing import Iterable, IO

class CsvToLdjsonProcessor(RecordProcessor):
    def load(self, path: str) -> Iterable[dict]:
        with open(path, newline='') as f:
            yield from csv.DictReader(f)

    def transform(self, record: dict) -> dict:
        return record

    def write(self, record: dict, out: IO[str]) -> None:
        out.write(json.dumps(record) + '\n')

Rule of thumb: if your transform is the identity, leave it as the identity — interviewers reward not over-engineering as much as they reward implementing correctly.

Chunked CSV reading with `csv.DictReader` and line-delimited JSON output

The streaming invariant: csv.DictReader is a lazy iterator; iterating it yields one row at a time without loading the whole file. Pair it with a for loop that writes each transformed row to the output stream, and the program runs in O(1) extra memory regardless of input size.

csv.DictReader(f) — yields one dict per CSV row, using the first row as header keys.
yield from — the load method delegates iteration to DictReader without buffering.
for record in self.load(path): — the orchestrator iterates lazily; transform and write run per row.
Line-delimited JSON — json.dumps(record) + '\n' per row produces a file where each line is independently parseable.

Worked example. Streaming a 10-million-row CSV through the processor uses constant memory, not 10M-row memory.

stage	memory profile
`load`	one row at a time
`transform`	one row at a time
`write`	one row at a time
total extra memory	`O(1)`

Worked-example solution.

def run(processor: RecordProcessor, in_path: str, out_path: str) -> int:
    count = 0
    with open(out_path, 'w') as out:
        for record in processor.load(in_path):
            transformed = processor.transform(record)
            processor.write(transformed, out)
            count += 1
    return count

Rule of thumb: never for record in list(processor.load(path)) — wrapping a generator in list() materializes the whole file and defeats the streaming guarantee.

Common beginner mistakes

Using pandas.read_csv(path) instead of csv.DictReader — loads the whole file, defeats the streaming invariant.
Forgetting newline='' on open() — Python's csv module needs that to handle quoted multi-line fields correctly.
Implementing load to return a list instead of yielding — same issue as the pandas mistake, scaled.
Writing JSON without the trailing '\n' — produces concatenated objects on one line, not line-delimited.
Skipping the with block on the output file — the file may not flush on early termination.

Python Interview Question on Chunked CSV to LDJSON Processor

Implement a class hierarchy that reads a CSV file row-by-row and writes the rows as line-delimited JSON to an output file, without loading the whole CSV into memory. The base class RecordProcessor must be abstract with three @abstractmethod methods: load(path) (returns an iterable of dict), transform(record) (returns a dict), and write(record, out) (writes one record and returns None). Implement a concrete subclass CsvToLdjsonProcessor that satisfies the contract. Provide a top-level run(processor, in_path, out_path) function that orchestrates the streaming pipeline and returns the row count.

Solution Using a Concrete Subclass of an Abstract RecordProcessor

from __future__ import annotations
import csv
import json
from abc import ABC, abstractmethod
from typing import IO, Iterable

class RecordProcessor(ABC):
    @abstractmethod
    def load(self, path: str) -> Iterable[dict]: ...
    @abstractmethod
    def transform(self, record: dict) -> dict: ...
    @abstractmethod
    def write(self, record: dict, out: IO[str]) -> None: ...

class CsvToLdjsonProcessor(RecordProcessor):
    def load(self, path: str) -> Iterable[dict]:
        with open(path, newline='') as f:
            yield from csv.DictReader(f)

    def transform(self, record: dict) -> dict:
        return record

    def write(self, record: dict, out: IO[str]) -> None:
        out.write(json.dumps(record) + '\n')

def run(processor: RecordProcessor, in_path: str, out_path: str) -> int:
    count = 0
    with open(out_path, 'w') as out:
        for record in processor.load(in_path):
            transformed = processor.transform(record)
            processor.write(transformed, out)
            count += 1
    return count

Step-by-step trace (input events.csv with 3 rows):

name,age,city
alice,30,NYC
bob,25,SF
carol,40,LON

step	action	output side effect
1	`processor.load('events.csv')`	yields `{'name':'alice','age':'30','city':'NYC'}`
2	`transform(record)`	returns the same dict (identity)
3	`write(record, out)`	writes `{"name":"alice","age":"30","city":"NYC"}\n`
4	next iteration	yields `{'name':'bob','age':'25','city':'SF'}`
5	transform + write	writes `{"name":"bob","age":"25","city":"SF"}\n`
6	next iteration	yields `{'name':'carol','age':'40','city':'LON'}`
7	transform + write	writes `{"name":"carol","age":"40","city":"LON"}\n`
8	iterator exhausted	`count = 3` returned

Open the output file inside with — guarantees flush and close even on exceptions.
processor.load(in_path) — returns a generator; iterating it lazily reads one row at a time from the CSV.
processor.transform(record) — applies the per-record transformation (identity here).
processor.write(transformed, out) — writes one JSON line per record.
Increment count — after each successful write.
Return count — at end of iteration.

Output:

{"name": "alice", "age": "30", "city": "NYC"}
{"name": "bob", "age": "25", "city": "SF"}
{"name": "carol", "age": "40", "city": "LON"}

Plus return value 3.

Why this works — concept by concept:

ABC + @abstractmethod enforces the contract — RecordProcessor cannot be instantiated directly; any subclass missing load, transform, or write raises TypeError at construction. The interface is enforced by the language, not by convention.
csv.DictReader yields lazily — it is a generator that reads one row at a time; pairing it with yield from keeps the entire pipeline streaming.
Generator-based load — load returns an iterable, not a list; the caller iterates with for and never materializes the file.
Line-delimited JSON — json.dumps(record) + '\n' per row produces a file where each line is independently parseable, which is the canonical format for log streaming and incremental ingestion.
with-block on the output — the output handle flushes and closes deterministically; an exception mid-stream does not leave a partial file open.
Cost — O(1) extra memory regardless of input size (the generator holds at most one row), O(n) time where n is the input row count; no O(n) memory spike at any stage.

PYTHON
Bloomberg — OOP / abstract
Chunked CSV to LDJSON Processor (Bloomberg)

Practice →

PYTHON
Topic — OOP
OOP Python problems

Practice →

4. SQL Window Functions for Time-Series and Subscription Overlap

`PARTITION BY`, `LAG`, and rolling aggregates in SQL for time-series and overlap analysis

The third canonical Bloomberg interview pattern is SQL window functions on time-series data plus interval-overlap detection. Bloomberg's external interview reports converge here — a recent DataLemur compilation lists eight Bloomberg SQL questions covering rolling aggregates, monthly per-product averages, and ranking-tie semantics, and Interview Query flags Subscription Overlap as a flagship Bloomberg SQL question. The canonical primitives are SUM(volume) OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals, LAG/LEAD for adjacent-row comparisons, and EXISTS subqueries with interval crosschecks for overlap detection. PipeCode does not yet host a Bloomberg-tagged SQL problem, so this section is anchored on the topic-level SQL window functions practice page.

Pro tip: window functions evaluate after WHERE and before ORDER BY in the SQL pipeline. That is why you cannot filter WHERE rolling_total > 1000 in the same SELECT as the window — wrap the window in a CTE first, then filter in the outer query. This trips up most candidates on the first onsite SQL prompt.

`SUM` / `AVG` `OVER (PARTITION BY ... ORDER BY ...)` for rolling totals

The rolling-aggregate invariant: SUM(metric) OVER (PARTITION BY entity ORDER BY ts) returns a per-row running total scoped to each entity, ordered by time, including all rows up to and including the current row by default.

PARTITION BY symbol — resets the running aggregate at each symbol boundary.
ORDER BY trade_date — defines the time direction of the cumulation.
Default frame — RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Override with ROWS BETWEEN N PRECEDING AND CURRENT ROW for a fixed-window rolling sum.

Worked example. A ticks(symbol, trade_date, volume) table.

symbol	trade_date	volume	running_volume
AAPL	2026-04-01	100	100
AAPL	2026-04-02	150	250
AAPL	2026-04-03	200	450
MSFT	2026-04-01	80	80
MSFT	2026-04-02	120	200

Worked-example solution.

SELECT symbol, trade_date, volume,
       SUM(volume) OVER (
         PARTITION BY symbol
         ORDER BY trade_date
       ) AS running_volume
FROM ticks;

Rule of thumb: if the prompt says "running" / "cumulative" / "year-to-date" / "month-to-date" per entity, reach for SUM OVER (PARTITION BY entity ORDER BY ts) before anything else.

`LAG` / `LEAD` for adjacent-row comparison

The lag-lead invariant: LAG(col, n) OVER (PARTITION BY entity ORDER BY ts) returns the value of col from n rows earlier within the same partition, ordered by ts. LEAD is the symmetric "n rows later" variant.

LAG(volume) OVER (...) — previous row's volume, or NULL for the first row in the partition.
LAG(volume, 2) — two rows back.
LAG(volume, 1, 0) — default 0 instead of NULL for missing rows.

Worked example. Day-over-day volume change per symbol.

symbol	trade_date	volume	prev_volume	dod_change
AAPL	2026-04-01	100	NULL	NULL
AAPL	2026-04-02	150	100	50
AAPL	2026-04-03	200	150	50

Worked-example solution.

SELECT symbol, trade_date, volume,
       LAG(volume) OVER (PARTITION BY symbol ORDER BY trade_date) AS prev_volume,
       volume - LAG(volume) OVER (PARTITION BY symbol ORDER BY trade_date) AS dod_change
FROM ticks;

Rule of thumb: day-over-day deltas, week-over-week resets, and "did this row break a streak?" all map to LAG. Use LEAD for "is the next event within X minutes?".

Subscription overlap via interval crosscheck (`EXISTS` subquery)

The overlap invariant: two intervals (start_a, end_a) and (start_b, end_b) overlap if and only if start_a <= end_b and start_b <= end_a. This is the canonical interval-overlap formula; everything else (partial overlap, full containment, identical intervals) collapses to this two-condition check.

EXISTS subquery — for each row, check if any other row from the same table satisfies the overlap condition.
s1.user_id <> s2.user_id — exclude the row from comparing with itself.
End-date IS NOT NULL — only completed subscriptions count for overlap detection in Bloomberg's framing.

Worked example. A subscriptions(user_id, start_date, end_date) table with four users.

user_id	start_date	end_date	overlaps?
1	2026-01-01	2026-01-31	✓ (with user 2 and user 3)
2	2026-01-15	2026-01-17	✓ (with user 1)
3	2026-01-29	2026-02-04	✓ (with user 1)
4	2026-02-05	2026-02-10	✗

Worked-example solution.

SELECT s1.user_id,
       CASE WHEN EXISTS (
         SELECT 1 FROM subscriptions s2
         WHERE s2.user_id <> s1.user_id
           AND s2.end_date IS NOT NULL
           AND s1.start_date <= s2.end_date
           AND s2.start_date <= s1.end_date
       ) THEN 1 ELSE 0 END AS overlap
FROM subscriptions s1
WHERE s1.end_date IS NOT NULL;

Rule of thumb: the overlap formula is start_a <= end_b AND start_b <= end_a. Memorize it cold — every Bloomberg interval-detection prompt collapses to it.

Common beginner mistakes

Filtering WHERE running_volume > 1000 in the same SELECT as the window — windows evaluate after WHERE; wrap in a CTE first.
Forgetting PARTITION BY and getting one global running total across all symbols.
Using LAG(volume) without an ORDER BY inside the window — undefined ordering, undefined result.
Writing the overlap check as start_a < end_b AND start_b < end_a (strict) when the prompt allows touching boundaries — off-by-one on identical-day overlaps.
Forgetting s1.user_id <> s2.user_id — every row "overlaps with itself" and the answer is always 1.

SQL Interview Question on Detecting Subscription Overlap

Given a table subscriptions(user_id INT, start_date DATE, end_date DATE) where end_date IS NULL indicates an active subscription, return one row per user with overlap = 1 if the user's completed subscription window overlaps any other user's completed subscription, and overlap = 0 otherwise. Consider only rows where end_date IS NOT NULL. Use the canonical interval-overlap formula.

Solution Using EXISTS with Interval Crosscheck

SELECT
  s1.user_id,
  CASE WHEN EXISTS (
    SELECT 1
    FROM subscriptions s2
    WHERE s2.user_id   <> s1.user_id
      AND s2.end_date  IS NOT NULL
      AND s1.start_date <= s2.end_date
      AND s2.start_date <= s1.end_date
  ) THEN 1 ELSE 0 END AS overlap
FROM subscriptions s1
WHERE s1.end_date IS NOT NULL
ORDER BY s1.user_id;

Step-by-step trace (input subscriptions):

user_id	start_date	end_date
1	2026-01-01	2026-01-31
2	2026-01-15	2026-01-17
3	2026-01-29	2026-02-04
4	2026-02-05	2026-02-10

Filter to completed subscriptions — WHERE s1.end_date IS NOT NULL keeps all four rows here (every row has an end_date).
For user 1 (2026-01-01 to 2026-01-31) — check EXISTS. Compare against user 2 (2026-01-15 to 2026-01-17): 2026-01-01 <= 2026-01-17 ✓ AND 2026-01-15 <= 2026-01-31 ✓ → overlap. EXISTS returns true, overlap = 1.
For user 2 (2026-01-15 to 2026-01-17) — compare against user 1: 2026-01-15 <= 2026-01-31 ✓ AND 2026-01-01 <= 2026-01-17 ✓ → overlap. overlap = 1.
For user 3 (2026-01-29 to 2026-02-04) — compare against user 1: 2026-01-29 <= 2026-01-31 ✓ AND 2026-01-01 <= 2026-02-04 ✓ → overlap. overlap = 1.
For user 4 (2026-02-05 to 2026-02-10) — compare against every other user. User 1: 2026-02-05 <= 2026-01-31 ✗ (Feb 5 is after Jan 31). User 2: 2026-02-05 <= 2026-01-17 ✗. User 3: 2026-02-05 <= 2026-02-04 ✗. No overlap. overlap = 0.
ORDER BY s1.user_id for stability.

Output:

user_id	overlap
1	1
2	1
3	1
4	0

Why this works — concept by concept:

Interval overlap formula — two intervals (a_start, a_end) and (b_start, b_end) overlap iff a_start <= b_end AND b_start <= a_end; this two-condition check captures partial overlap, full containment, and shared boundaries in one expression.
EXISTS short-circuits — the subquery returns true the moment one matching row is found; for users with many overlapping peers, the engine does not evaluate every pair.
s1.user_id <> s2.user_id excludes self-overlap — without this guard, every row trivially overlaps itself and the answer is uniformly 1.
end_date IS NOT NULL filters incomplete rows — Bloomberg's framing only counts completed subscriptions; active subscriptions (open intervals) are excluded by the prompt.
CASE WHEN EXISTS ... THEN 1 ELSE 0 — produces a 0/1 flag column directly from the boolean predicate, no separate aggregation needed.
Cost — O(n²) worst case (every row compared against every other) without an index; with a B-tree index on (start_date, end_date) plus a stop-early plan, the average case drops to O(n log n) for typical interval distributions.

SQL
Topic — window functions
Window function problems (all companies)

Practice →

SQL
Topic — intervals
Interval / overlap problems

Practice →

Tips to crack Bloomberg data engineering interviews

These are habits that move the needle in real Bloomberg DE loops — not a re-statement of the topics above.

Practice with Bloomberg's data shape

Bloomberg's interview prompts model market data and subscription feeds: ticks, quotes, trades, subscriptions, provider_feeds, corporate_actions. Drilling on order-line ecommerce schemas wastes prep time. Stick to event-shaped tables with a per-symbol-per-day grain, and pull problems from the window-functions topic page for shapes that match.

Master `from abc import ABC, abstractmethod` cold

Half of Bloomberg DE coding rounds reach for an abstract base class — confirmed by PipeCode #508 plus two of four recent external Bloomberg DE problems (DataProcessor, RotatingFileSink). Type the boilerplate from memory until it is muscle memory: class Base(ABC): @abstractmethod def method(self): ... plus a concrete subclass with every abstract method implemented. Drill on the OOP topic page.

Type your Python with `from future import annotations`

Bloomberg grades production-quality code, and that includes type hints. Add from __future__ import annotations at the top, use list[dict] and Iterable[dict] rather than List[Dict], and annotate every function signature. Interviewers notice — and the type-checker catches the off-by-one before the interviewer does.

Drill window functions

PARTITION BY plus ORDER BY plus optional LAG/LEAD plus the ROWS BETWEEN N PRECEDING AND CURRENT ROW frame override is the Bloomberg SQL toolkit. Practice with SQL window-function problems until the syntax is reflex. PipeCode's SQL for data engineering interviews course drills these primitives across forty-plus problems.

Bring two STAR stories per behavioral round

Bloomberg's behavioral round goes technical-deep — interviewers drill past project details, not just culture-fit. Pick two real projects with crisp numbers (rows per day, p99, cost per query, post-incident retro) and rehearse the deep-dive Q&A. Generic teamwork stories will not pass.

Where to practice on PipeCode

Skill lane	Practice path
Curated Bloomberg practice set	/explore/practice/company/bloomberg
Bloomberg Python practice	/explore/practice/company/bloomberg/python
Two-pointer + string in Python	/explore/practice/topic/two-pointers/python
String-manipulation in Python	/explore/practice/topic/string-manipulation/python
OOP + abstract classes	/explore/practice/topic/abstract-classes
File I/O + CSV parsing	/explore/practice/topic/file-io/python
SQL window functions	/explore/practice/topic/window-functions/sql
Intervals / overlap	/explore/practice/topic/intervals
All practice topics	/explore/practice/topics
Interview courses	/explore/courses

Communication under time pressure

State assumptions before typing: "I'll assume the input CSV has a header row, no embedded newlines in fields, and one consistent delimiter." State grain: "One JSON object per CSV row, in source order." State edge cases: "If a row has missing keys, my transform returns the partial dict and write serializes whatever keys are present." Interviewers grade clear reasoning above silent-and-perfect.

Frequently asked questions

What is the Bloomberg data engineering interview process?

The Bloomberg data engineer interview process is a four-stage funnel: a 45-60 minute technical phone screen in CoderPad covering a medium-difficulty algorithmic problem with follow-ups, an optional online assessment for some roles (Byteboard / HackerRank format with SQL plus Python), a virtual onsite of three to five rounds covering coding plus system design plus behavioral plus fit, and a final decision call. Total elapsed time is typically about thirty days. The recent reported pass rate is 8% across thirteen samples — making it the most selective DE loop tracked in our blog series.

What programming languages does Bloomberg test for data engineering?

Bloomberg's data engineering interviews lean on Python and SQL, with Java or C++ appearing for some platform / infra roles. Python is production-quality — from abc import ABC, abstractmethod, csv.DictReader, type hints, generator-based streaming. SQL is at LeetCode-medium / DataLemur grade with a strong tilt toward window functions, rolling aggregates, and interval-overlap detection. PipeCode's Bloomberg Python practice anchors the Python side; the SQL flavor is covered by topic-level pages.

What SQL topics show up most in Bloomberg data engineering interviews?

The topics are narrow and consistent: window functions (SUM / AVG OVER (PARTITION BY symbol ORDER BY trade_date) for rolling totals), LAG/LEAD for adjacent-row comparisons (day-over-day change, streak detection), HAVING + COUNT for power-user filters, self-joins for pairwise comparisons, and EXISTS subqueries with interval crosschecks for overlap detection. PipeCode's window-function problems and interval problems cover these directly.

How difficult are Bloomberg data engineering interview questions?

Bloomberg data engineering interview questions are calibrated above generic LeetCode-medium for the algorithm half — closer to a production-code bar than a contest bar. The two-problem PipeCode set splits 1 EASY (Reverse Words in String) for warm-ups and 1 HARD (Chunked CSV to LDJSON Processor) for production-quality OOP. Reports describe the panel as friendly but rigorous; the 8% pass rate reflects code-review-grade judgment on edge cases, type hints, and streaming-friendly file I/O — not contest depth.

How should I prepare for a Bloomberg data engineering interview?

Solve the 2-problem Bloomberg practice set end to end — that maps the exact Python pattern coverage. Then back-fill: 10+ two-pointer / string-manipulation problems for the EASY phone-screen flavor, 10+ OOP / abstract-class problems for the HARD onsite Python round, and 20+ SQL window-function and interval-overlap problems for the SQL round. Add Bloomberg-specific behavioral prep — two real STAR stories with crisp numbers — and one read-through of the streaming stack (Kafka, Flink, Airflow, Parquet).

Does Bloomberg test object-oriented Python with abstract classes?

Yes — heavily. Half of recent Bloomberg DE coding rounds use an abstract base class (from abc import ABC, abstractmethod) and ask candidates to implement a concrete subclass. PipeCode's Chunked CSV to LDJSON Processor is the canonical PipeCode-Bloomberg problem in this family, and external interview reports confirm the pattern repeats with DataProcessor and RotatingFileSink framings. Master the ABC boilerplate and the streaming-generator pattern and this round becomes mechanical.

Start practicing Bloomberg data engineering problems

Reading patterns is not the same as typing them under time pressure. PipeCode pairs company-tagged Bloomberg problems with tests, AI feedback, and a coding environment so you can drill the exact Python OOP, two-pointer, and SQL window-function patterns Bloomberg asks — without the noise of generic algorithm prep.

Pipecode.ai is Leetcode for Data Engineering.

Browse Bloomberg practice →
Python for DE interviews course →

Top Bloomberg data engineering interview topics

1. The Bloomberg Data Engineering Interview Process

The Bloomberg DE interview funnel from phone screen to final round

Phone screen — 45 to 60 minutes of coding fluency

Online assessment and the take-home variant

Onsite — coding, design, behavioral, fit (3-5 rounds)

Behavioral round — STAR plus deep-dive into past project tech

Practice: drill the Bloomberg DE panel before the live screen

2. Python Two-Pointer and String Manipulation

Two-pointer iteration, string reversal, and split-rejoin in Python for data engineering

Splitting on whitespace — collapsing multiple spaces

Reversing a list with [::-1] vs reversed()

Two-pointer in-place reversal of a character array

Python Interview Question on Reverse Words in String

Solution Using split-reverse-join

3. Abstract Classes, OOP, and File I/O in Python

ABC, abstract methods, and chunked file processing in Python for data engineering

from abc import ABC, abstractmethod — defining the contract

Subclassing and implementing every abstract method

Chunked CSV reading with csv.DictReader and line-delimited JSON output

Python Interview Question on Chunked CSV to LDJSON Processor

Solution Using a Concrete Subclass of an Abstract RecordProcessor

4. SQL Window Functions for Time-Series and Subscription Overlap

PARTITION BY, LAG, and rolling aggregates in SQL for time-series and overlap analysis

SUM / AVG OVER (PARTITION BY ... ORDER BY ...) for rolling totals

LAG / LEAD for adjacent-row comparison

Subscription overlap via interval crosscheck (EXISTS subquery)

SQL Interview Question on Detecting Subscription Overlap

Solution Using EXISTS with Interval Crosscheck

Tips to crack Bloomberg data engineering interviews

Practice with Bloomberg's data shape

Master from abc import ABC, abstractmethod cold

Type your Python with from __future__ import annotations

Drill window functions

Bring two STAR stories per behavioral round

Where to practice on PipeCode

Communication under time pressure

Frequently asked questions

What is the Bloomberg data engineering interview process?

What programming languages does Bloomberg test for data engineering?

What SQL topics show up most in Bloomberg data engineering interviews?

How difficult are Bloomberg data engineering interview questions?

How should I prepare for a Bloomberg data engineering interview?

Does Bloomberg test object-oriented Python with abstract classes?

Start practicing Bloomberg data engineering problems

Reversing a list with `[::-1]` vs `reversed()`

`from abc import ABC, abstractmethod` — defining the contract

Chunked CSV reading with `csv.DictReader` and line-delimited JSON output

`PARTITION BY`, `LAG`, and rolling aggregates in SQL for time-series and overlap analysis

`SUM` / `AVG` `OVER (PARTITION BY ... ORDER BY ...)` for rolling totals

`LAG` / `LEAD` for adjacent-row comparison

Subscription overlap via interval crosscheck (`EXISTS` subquery)

Master `from abc import ABC, abstractmethod` cold

Type your Python with `from future import annotations`