Cisco data engineering interview questions are Python end-to-end with a fundamentals-and-decorators edge: four primitives — {v: k for k, v in d.items()} dict-comprehension key-value inversion, status-filter dicts via {k: v for k, v in d.items() if v['status'] == 'active'}, function-timing decorators built with functools.wraps and time.perf_counter(), and the greedy comparator-sort sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)) that produces the maximum concatenated string. The framings are everyday DE Python — invert a lookup dict, filter records by metadata flag, time a slow function via decorator, build the biggest string by concatenating chunks in the right order.
This guide walks through the four topic clusters Cisco actually tests, each with a detailed topic explanation, per-sub-topic explanation with a worked example and its solution, and an interview-style problem with a full solution that explains why it works. The mix matches the curated 4-problem Cisco set (3 easy, 1 medium, no hard) — a Python-fundamentals hub that rewards dict-comprehension fluency, decorator awareness, and a single non-obvious sorting trick. There is no SQL in this loop; every minute of preparation should go to Python.
Top Cisco data engineering interview topics
From the Cisco data engineering practice set, the four numbered sections below follow this topic map (one row per H2):
| # | Topic (sections 1–4) | Why it shows up at Cisco |
|---|---|---|
| 1 | Python dict comprehensions for key-value inversion | Dictionary Key-Value Inversion — {v: k for k, v in d.items()}, plus duplicate-value handling. |
| 2 | Python hash tables and status filters | Dictionary Status Filter — {k: v for k, v in d.items() if v['status'] == 'active'} over nested dict values. |
| 3 | Python decorators and functools.wraps for function timing |
Decorator Timing — @functools.wraps decorator pattern with time.perf_counter() and *args/**kwargs forwarding. |
| 4 | Python sorting and greedy for the maximum concatenated substring | Maximum Substring (MEDIUM) — sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)) then ''.join(...). |
Python-fundamentals framing rule: Cisco's prompts are everyday data-engineering Python — dict inversions, status filters, function timing, max-concat strings. The interviewer is grading whether you map each business framing to the right primitive: invert lookup → dict comprehension; filter records →
if condclause inside comprehension; instrument a slow function → decorator +perf_counter; biggest concatenated string → greedy comparator sort, not lexicographic. State the mapping out loud.
1. Python Dict Comprehensions for Key-Value Inversion
Dict comprehensions and inverted lookups in Python for data engineering
"Invert a dict so values become keys and keys become values" is the canonical dict-comprehension interview prompt at Cisco. The mental model: {v: k for k, v in d.items()} swaps each pair in one line; the result is a new dict mapping the original values to their original keys. Same primitive powers any "flip a lookup table" or "build a reverse index" question — invert a country-code map, build a name → id dict from an id → name dict.
Pro tip: Dict-inversion silently overwrites duplicates. If two original keys share a value, only one survives in the inverted dict — and which one is insertion-order-dependent (last wins in Python 3.7+). State the duplicate-handling rule out loud; the interviewer will probe it.
Dict comprehension syntax: {expr_k: expr_v for ... in ...}
The dict-comprehension invariant: {key_expr: value_expr for x in iter} builds a dict by evaluating both expressions for each element in the iterable. Reading order: key_expr, then value_expr, then for x in iter. Execution order: scan iter, evaluate both expressions, insert into the dict.
-
{k: v for k, v in d.items()}— copy ofd(no transformation). -
{k: v * 2 for k, v in d.items()}— copy with doubled values. -
{k: v for k, v in d.items() if v > 0}— filter clause; only positive values pass. -
Set comprehension —
{expr for ... in ...}(no colon) — different syntax, different output type.
Worked example. Build a square-roots dict from a list of perfect squares.
| input | output |
|---|---|
[1, 4, 9, 16] |
{1: 1, 4: 2, 9: 3, 16: 4} |
Worked-example solution.
nums = [1, 4, 9, 16]
sqrts = {n: int(n ** 0.5) for n in nums}
print(sqrts)
# {1: 1, 4: 2, 9: 3, 16: 4}
Rule of thumb: if the loop body is "compute key, compute value, insert into dict," it's a dict comprehension; reach for it before for n in nums: d[n] = ....
Inversion: swap k and v in the comprehension head
The inversion invariant: {v: k for k, v in d.items()} swaps each pair. The original key becomes the new value; the original value becomes the new key. Both halves must be hashable (the new key especially — original values that are lists or dicts will fail with TypeError).
-
{v: k for k, v in d.items()}— straight inversion. -
{v: [k for k, v2 in d.items() if v2 == v] for v in set(d.values())}— grouping form; preserves all duplicate keys. -
Hashability check — original values must be
hashable; lists/dicts will raise. -
d.items()— yields(key, value)tuples; perfect for unpacking in the comprehension head.
Worked example. Invert a small lookup dict.
| input | output |
|---|---|
{'a': 1, 'b': 2, 'c': 3} |
{1: 'a', 2: 'b', 3: 'c'} |
Worked-example solution.
d = {'a': 1, 'b': 2, 'c': 3}
inv = {v: k for k, v in d.items()}
print(inv)
# {1: 'a', 2: 'b', 3: 'c'}
Rule of thumb: {v: k for k, v in d.items()} is the canonical one-liner; if the interviewer asks for "all keys per value" instead, switch to the grouping form.
Duplicate-value handling: last-wins vs preserve-all-via-list
The duplicate invariant: when two original keys share a value, the inverted dict can only keep one — the last one inserted wins (Python 3.7+) or is implementation-defined on older versions. To preserve all original keys per value, group into a defaultdict(list).
-
Last-wins behavior —
{v: k for k, v in d.items()}silently drops earlier duplicates. -
Preserve-all form —
defaultdict(list)+ loop, or a set-comprehension overd.values(). -
set(d.values())— distinct values; useful when you need to know "are there duplicates?". -
Worked-example: with
{'a': 1, 'b': 2, 'c': 1}, the simple inversion returns{1: 'c', 2: 'b'}(loses'a'); the grouping form returns{1: ['a', 'c'], 2: ['b']}.
Worked example. Demonstrate both behaviors on {'a': 1, 'b': 2, 'c': 1}.
| approach | output |
|---|---|
{v: k for k, v in d.items()} |
{1: 'c', 2: 'b'} (last wins) |
defaultdict(list) group |
{1: ['a', 'c'], 2: ['b']} |
Worked-example solution.
from collections import defaultdict
d = {'a': 1, 'b': 2, 'c': 1}
# Last-wins inversion
inv_simple = {v: k for k, v in d.items()}
print(inv_simple) # {1: 'c', 2: 'b'}
# Preserve-all grouping
inv_group = defaultdict(list)
for k, v in d.items():
inv_group[v].append(k)
print(dict(inv_group)) # {1: ['a', 'c'], 2: ['b']}
Rule of thumb: always ask the interviewer about duplicate semantics before inverting; "what should happen if two keys share a value?" is the question that earns points.
Common beginner mistakes
- Trying to invert a dict whose values are unhashable (lists, dicts) —
TypeErroron dict-key creation. - Using
{k: v}instead of{v: k}— that's a copy, not an inversion. - Building a list of
(v, k)tuples and converting viadict(...)— works but verbose; comprehension is one line. - Assuming "first wins" on duplicate values — Python 3.7+ is "last wins" by insertion order.
- Forgetting
.items()and iteratingfor k in d— gets keys only; you need both.
Python Interview Question on Dictionary Key-Value Inversion
Given a dict d whose values are hashable, return a new dict where each value becomes a key mapping to the original key. If two original keys share a value, the lexicographically smallest original key should be kept.
def invert_min_key(d):
...
Solution Using a min-aware loop over d.items()
def invert_min_key(d):
inv = {}
for k, v in d.items():
if v not in inv or k < inv[v]:
inv[v] = k
return inv
Step-by-step trace (input: {'b': 1, 'a': 1, 'c': 2}):
| step | k | v | inv before | action | inv after |
|---|---|---|---|---|---|
| 1 | b |
1 | {} |
1 not in inv → set |
{1: 'b'} |
| 2 | a |
1 | {1: 'b'} |
'a' < 'b' → replace |
{1: 'a'} |
| 3 | c |
2 | {1: 'a'} |
2 not in inv → set |
{1: 'a', 2: 'c'} |
-
Initialize — empty
inv. -
Row
('b', 1)—1not ininv; insertinv[1] = 'b'. -
Row
('a', 1)—1already ininvwith value'b';'a' < 'b'lexicographically, so replace. -
Row
('c', 2)—2not ininv; insertinv[2] = 'c'. -
Return —
{1: 'a', 2: 'c'}.
Output:
| return value |
|---|
{1: 'a', 2: 'c'} |
Why this works — concept by concept:
-
d.items()— yields(key, value)pairs in insertion order; required to access both halves. -
v not in inv— first-touch detection; the value hasn't been seen yet, so we insert directly. -
k < inv[v]— lexicographic comparison on already-stored key; replaces it only if the current key is smaller, enforcing the min-key tie-break. - Single-pass — one walk over the dict; no second loop, no auxiliary structures beyond the result.
- Hashability — original values must be hashable to become keys; the prompt guarantees this.
-
O(N)time,O(K)space —Noriginal entries,Kdistinct values in the result.
PYTHON
Topic — dictionary
Dictionary problems
PYTHON
Cisco — dictionary
Cisco dictionary problems
2. Python Hash Tables and Status Filters
Filter-comprehensions over nested dict values in Python for data engineering
"Filter a dict-of-dicts by a status field" is the canonical hash-table interview prompt at Cisco. The mental model: {k: v for k, v in d.items() if v['status'] == 'active'} keeps only entries whose nested status matches. Same primitive powers any "filter records by metadata flag" question — keep enabled users, drop archived orders, surface pending jobs.
Pro tip: Always guard against missing keys in the predicate.
v['status']raisesKeyErrorif any record lacks the field;v.get('status') == 'active'is the safer form. State the missing-key rule with the interviewer — "drop missing-status records" or "treat missing as inactive" — both are reasonable but they need to be explicit.
Filter-only comprehension: {k: v for k, v in d.items() if cond}
The filter invariant: the if cond clause runs after unpacking but before insertion; only entries where cond is truthy are kept. The expression slots k and v are unchanged from the original — this is a filter, not a map.
-
{k: v for k, v in d.items() if cond}— filter-only. -
{k: f(v) for k, v in d.items() if cond}— filter-then-map. -
Multiple
ifclauses — chainif cond1 if cond2; both must be truthy. -
Negative filter —
if not condfor "drop matching" semantics.
Worked example. Keep only positive entries.
| input | output |
|---|---|
{'a': 1, 'b': -2, 'c': 3, 'd': 0} |
{'a': 1, 'c': 3} |
Worked-example solution.
d = {'a': 1, 'b': -2, 'c': 3, 'd': 0}
positive = {k: v for k, v in d.items() if v > 0}
print(positive)
# {'a': 1, 'c': 3}
Rule of thumb: filter-comprehensions read top-to-bottom: "for each pair, if condition, keep it." Reach for them before for k in list(d): if not cond: del d[k] (which mutates).
Nested-attribute predicates: v['status'] == 'active'
The nested-predicate invariant: inside the if clause, v is the inner dict; access fields with v['key'] or v.get('key'). The .get form is safer when the field is optional; use [] only when the schema guarantees it.
-
v['status']— direct access; raisesKeyErrorif missing. -
v.get('status')— returnsNoneif missing. -
v.get('status') == 'active'— defensive equality check. -
v.get('status', 'unknown') == 'active'— explicit default for missing.
Worked example. Filter users dict by status.
| input | output |
|---|---|
{1: {'name': 'Alice', 'status': 'active'}, 2: {'name': 'Bob', 'status': 'inactive'}, 3: {'name': 'Carol', 'status': 'active'}} |
{1: {...Alice...}, 3: {...Carol...}} |
Worked-example solution.
users = {
1: {'name': 'Alice', 'status': 'active'},
2: {'name': 'Bob', 'status': 'inactive'},
3: {'name': 'Carol', 'status': 'active'},
}
active = {k: v for k, v in users.items() if v.get('status') == 'active'}
print(active)
# {1: {'name': 'Alice', 'status': 'active'}, 3: {'name': 'Carol', 'status': 'active'}}
Rule of thumb: .get for optional fields, [] for required ones; mixing them is a sign of unclear thinking about the data shape.
Combining filter and projection: return only certain fields per match
The combined invariant: the value_expr slot can transform v per match; common patterns are projecting a subset of fields, computing a derived value, or wrapping with a label. Keep the projection inside the comprehension; don't post-process with a second loop.
-
{k: v['name'] for k, v in d.items() if cond}— project a single field. -
{k: {'name': v['name'], 'role': v.get('role', 'guest')} for k, v in d.items() if cond}— pick a subset of fields. -
{k: f(v) for k, v in d.items() if cond}— call a transform function. -
Nested comprehensions for re-shaping — pair with
.items()of the inner dict.
Worked example. Project active users to {id: name}.
| input | output |
|---|---|
| (same as above) | {1: 'Alice', 3: 'Carol'} |
Worked-example solution.
active_names = {
k: v['name']
for k, v in users.items()
if v.get('status') == 'active'
}
print(active_names)
# {1: 'Alice', 3: 'Carol'}
Rule of thumb: one comprehension can express "filter + project" cleanly; resist the temptation to break it into two passes.
Common beginner mistakes
- Using
v['status']without.getand crashing on schema-missing records. - Putting the
ifbefore thefor— that's a conditional expression, not a filter clause; syntax error in dict comprehensions. - Mutating
din place during iteration —RuntimeError: dictionary changed size during iteration. - Comparing
v.get('status') is 'active'— works for short interned strings but is unreliable; use==. - Forgetting that comprehension creates a new dict; the original
dis unchanged.
Python Interview Question on Dictionary Status Filter
Given a dict-of-dicts where each value contains at least a status key, return a new dict containing only entries whose status is 'active'. Records missing the status field should be dropped (not raise).
def filter_active(d):
...
Solution Using a filter dict-comprehension with .get
def filter_active(d):
return {k: v for k, v in d.items() if v.get('status') == 'active'}
Step-by-step trace (input: 4 user records):
| id | name | status |
|---|---|---|
| 1 | Alice | active |
| 2 | Bob | inactive |
| 3 | Carol | active |
| 4 | Dave | (missing) |
-
Iterate —
(1, {...Alice, active}),(2, {...Bob, inactive}),(3, {...Carol, active}),(4, {...Dave, no status}). -
Test predicate per pair —
- id 1:
v.get('status') == 'active'→'active' == 'active'→ True → keep. - id 2:
'inactive' == 'active'→ False → drop. - id 3: True → keep.
- id 4:
v.get('status')→None;None == 'active'→ False → drop.
- id 1:
-
Build result dict —
{1: {...}, 3: {...}}.
Output:
| id | name | status |
|---|---|---|
| 1 | Alice | active |
| 3 | Carol | active |
Why this works — concept by concept:
-
d.items()— yields(key, value)pairs over the outer dict. -
v.get('status')— safe access; returnsNonefor missing field instead ofKeyError. -
== 'active'—None == 'active'isFalse, so missing-field records drop cleanly. -
Filter clause
if ...— runs after unpacking, before insertion; failed predicates produce no output entry. - New dict — comprehension constructs a new dict; the original is untouched, no mutation gotcha.
-
O(N)time,O(K)space —Noriginal entries,Kmatching entries in the output.
PYTHON
Topic — hash table
Hash-table problems
PYTHON
Cisco — hash table
Cisco hash-table problems
3. Python Decorators and functools.wraps for Function Timing
Decorator wrappers and high-resolution timing in Python for data engineering
"Build a decorator that times a function and prints elapsed seconds" is Cisco's signature decorator interview prompt. The mental model: a decorator is a function that takes a function and returns a wrapper; the wrapper captures t0 = perf_counter() before calling the wrapped function, captures t1 after, and prints or logs the difference. The same pattern powers any cross-cutting instrumentation — logging, retry, caching, auth checks.
Pro tip: Always wrap with
@functools.wraps(fn). Without it, the wrapped function's__name__,__doc__, and__module__are clobbered to the wrapper's, which breaks introspection (Sphinx docs, debuggers, decorator stacks). It's a one-line ritual that interviewers expect to see.
Decorator skeleton: def decorator(fn): @wraps(fn) def inner(*a, **kw): ... return inner
The decorator invariant: a decorator is a higher-order function that takes a callable and returns a new callable with the same signature. The wrapper accepts *args and **kwargs to forward whatever the original function accepts.
-
Outer function — takes
fn, the function being decorated. -
@functools.wraps(fn)— preserves__name__,__doc__, etc. -
Inner function — accepts
*args, **kwargsto be call-signature-agnostic. -
Call the original —
result = fn(*args, **kwargs). - Return the result — return value must propagate through the wrapper.
Worked example. Skeleton decorator that prints 'before' and 'after'.
import functools
def announce(fn):
@functools.wraps(fn)
def inner(*args, **kwargs):
print('before')
result = fn(*args, **kwargs)
print('after')
return result
return inner
@announce
def add(a, b):
return a + b
print(add(2, 3))
# before
# after
# 5
Rule of thumb: every wrapper accepts *args, **kwargs unless you're writing a single-purpose decorator for a known signature.
time.perf_counter() for high-resolution elapsed seconds
The timing invariant: perf_counter() returns a monotonic float measuring seconds since an arbitrary start point; differences between two calls give elapsed time with sub-microsecond resolution on most platforms. Use it for benchmarks; never use time.time() (which can jump backward across NTP adjustments).
-
time.perf_counter()— monotonic, high-resolution. -
time.time()— wall-clock; can move backward; avoid for elapsed. -
time.process_time()— CPU time only; use for compute-bound profiling. -
time.perf_counter_ns()— nanosecond integer; same monotonic source.
Worked example. Time time.sleep(0.05).
import time
t0 = time.perf_counter()
time.sleep(0.05)
t1 = time.perf_counter()
print(f"elapsed = {t1 - t0:.4f} s")
# elapsed = 0.0501 s (approximately)
Rule of thumb: perf_counter for elapsed; time.time for "what is the current wall clock"; never confuse them.
Preserving metadata with functools.wraps
The metadata invariant: @functools.wraps(fn) copies __name__, __doc__, __wrapped__, and __module__ from fn to the wrapper. Without it, decorated functions look like the wrapper to introspection, which breaks debugging tools and stacked decorators.
-
@functools.wraps(fn)— official one-liner. -
functools.update_wrapper(inner, fn)— equivalent imperative form. -
Without
wraps—add.__name__becomes'inner'instead of'add'. -
__wrapped__—wrapssets this so callers can recover the original.
Worked example. Compare __name__ with and without wraps.
import functools
def bad(fn):
def inner(*args, **kwargs):
return fn(*args, **kwargs)
return inner
def good(fn):
@functools.wraps(fn)
def inner(*args, **kwargs):
return fn(*args, **kwargs)
return inner
@bad
def f(): pass
@good
def g(): pass
print(f.__name__) # 'inner' (bad — clobbered)
print(g.__name__) # 'g' (good — preserved)
Rule of thumb: always reach for @functools.wraps; the one-line cost is always smaller than the debugging cost of forgetting it.
Common beginner mistakes
- Forgetting
@functools.wraps— silently breaks introspection. - Wrapping with a fixed signature like
def inner(a, b)— fails on functions with different arities; always use*args, **kwargs. - Using
time.time()for elapsed — non-monotonic; can yield negative deltas. - Forgetting to
return resultfrom the wrapper — decorated function returnsNone. - Capturing
t0outside the wrapper — measures decoration time, not call time.
Python Interview Question on Decorator Timing
Write a decorator timed that, when applied to any function, prints "<funcname> took <seconds> s" after each call and returns the function's original result. Preserve the wrapped function's name and docstring.
@timed
def slow_query(...):
...
Solution Using functools.wraps and time.perf_counter
import functools
import time
def timed(fn):
@functools.wraps(fn)
def inner(*args, **kwargs):
t0 = time.perf_counter()
result = fn(*args, **kwargs)
t1 = time.perf_counter()
print(f"{fn.__name__} took {t1 - t0:.4f} s")
return result
return inner
Step-by-step trace (decorate and call a function):
@timed
def slow_query(n):
"""Pretend to query a database."""
time.sleep(0.05)
return n * 2
result = slow_query(7)
-
Decoration —
slow_query = timed(slow_query);slow_querynow refers to the wrapper, but__name__and__doc__are preserved bywraps. -
Call
slow_query(7)— control entersinnerwithargs=(7,),kwargs={}. -
t0 = time.perf_counter()— capture start. -
result = fn(*args, **kwargs)— invokes the originalslow_query(7);time.sleep(0.05); returns14. -
t1 = time.perf_counter()— capture end. -
print(f"{fn.__name__} took {t1 - t0:.4f} s")— prints"slow_query took 0.0501 s"(approximately). -
return result— wrapper returns14to the caller.
Output:
| stdout | return value |
|---|---|
slow_query took 0.0501 s |
14 |
Why this works — concept by concept:
-
Outer
def timed(fn)— takes the wrapped function and returns a new callable; the canonical decorator skeleton. -
@functools.wraps(fn)— preservesslow_query.__name__,__doc__,__module__; without it, the print line would sayinner took .... -
*args, **kwargs— accepts any signature; the decorator works on functions of any arity. -
time.perf_counter()— monotonic high-resolution timer; safe for elapsed measurements. -
fn(*args, **kwargs)— invokes the original function with the forwarded arguments. -
return result— propagates the return value; without it, callers getNoneand silently break. -
O(1)overhead per call — twoperf_counterreads + oneprint+ one delegated call.
PYTHON
Topic — decorator
Decorator problems
PYTHON
Topic — higher-order functions
Higher-order-function problems
4. Python Sorting and Greedy for the Maximum Concatenated Substring
Greedy comparator-sort for max concatenation in Python for data engineering
"Given a list of digit chunks, concatenate them in the order that produces the maximum numeric string" is Cisco's signature MEDIUM problem (#287 Maximum Substring). The mental model: lexicographic sort fails because '9' sorts after '12' alphabetically but 9 is greater than 12 numerically; instead, sort with a custom comparator that asks "does a + b come before b + a when concatenated?". The pair (a, b) should go in whichever order produces the larger concatenation. This greedy comparator extends to a total order; sorting then concatenating yields the maximum.
Pro tip: The comparator
a + b > b + ais the classic "greedy on pairs implies global optimum" trick — and it's not obvious from first principles. Memorize it: when concatenating chunks, you compare their string-concat results, not the chunks themselves.
Why lexicographic sort fails: '9' > '12' but 912 > 129
The lexicographic-fail invariant: string comparison is character-by-character; '9' > '12' because '9' > '1' at index 0. But concatenation order matters numerically: 9 + 12 = '912' (= 912) is larger than 12 + 9 = '129' (= 129). Lexicographic sort puts '12' before '9', producing '129...' — wrong.
-
'9' > '12'— string comparison is True (character-by-character). -
int('912') > int('129')— also True; concat order matters. -
sorted(['9', '12'])—['12', '9']; descending:['9', '12']→ concat ='912'(correct!). -
Counter-example —
['3', '30']. Lexicographic descending:['30', '3']→'303'. But'330'is bigger.
Worked example. Lexicographic descending fails on ['3', '30'].
| sort | concat | numeric |
|---|---|---|
sorted(['3', '30'], reverse=True) → ['30', '3']
|
'303' |
303 |
custom-comparator ['3', '30']
|
'330' |
330 |
Worked-example solution.
chunks = ['3', '30']
print(sorted(chunks, reverse=True)) # ['30', '3'] — wrong order
# Reason: '3' < '30' lexicographically; reverse puts '30' first
# but '3' + '30' = '330' is bigger than '30' + '3' = '303'.
Rule of thumb: when ordering strings for concatenation, never trust lexicographic sort; use the pairwise concatenation comparator.
Custom key: cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)
The comparator invariant: functools.cmp_to_key(cmp) adapts a two-argument comparator to a key= callable for sorted. The comparator returns negative if a should come before b, positive if b should come before a, zero if equal.
-
from functools import cmp_to_key— the import. -
cmp_to_key(lambda a, b: -1 if a+b > b+a else (1 if a+b < b+a else 0))— full three-way. -
Shorthand —
cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)works when ties are acceptable. -
sorted(chunks, key=cmp_to_key(...))— applies the comparator pairwise.
Worked example. Sort ['9', '30', '34', '5', '3'].
Pairwise comparisons:
-
'9'vs'30':'930' > '309'→9first. -
'9'vs'5':'95' > '59'→9first. -
'5'vs'34':'534' > '345'→5first. -
'34'vs'3':'343' > '334'→34first. -
'3'vs'30':'330' > '303'→3first.
After full sort: ['9', '5', '34', '3', '30']. Concat: '9534330'.
Worked-example solution.
from functools import cmp_to_key
chunks = ['9', '30', '34', '5', '3']
ordered = sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a + b > b + a else 1))
print(ordered)
# ['9', '5', '34', '3', '30']
Rule of thumb: a + b > b + a is the comparator that always works for "max concatenation"; memorize the pairwise form.
Concatenation after sort: ''.join(sorted_chunks)
The join invariant: after the comparator-sort produces the right order, ''.join(...) concatenates them into the final string. str.join is O(N) total for N chunks; faster and clearer than chained +=.
-
''.join(seq)— concatenate strings inseqwith no separator. -
'-'.join(seq)— with separator (different problem). -
sum(seq, '')— works butO(N²); never use for strings. -
Edge case — if all chunks are
'0', the result is'000...'; convention often says return'0'instead. Check the prompt.
Worked example. Apply join to the sorted list.
| sorted | concat |
|---|---|
['9', '5', '34', '3', '30'] |
'9534330' |
Worked-example solution.
ordered = ['9', '5', '34', '3', '30']
result = ''.join(ordered)
print(result)
# '9534330'
Rule of thumb: ''.join is the only correct way to concatenate a list of strings in Python; never use a += loop.
Common beginner mistakes
- Sorting lexicographically with
reverse=Trueand assuming it works — fails on['3', '30']. - Forgetting
from functools import cmp_to_key—NameError. - Returning a list instead of a string — read the prompt carefully.
- Mutating the input — use
sorted(...)(returns new list) instead of.sort()(in place). - Mixing string chunks and integer chunks — convert to strings first; the comparator depends on string concatenation.
Python Interview Question on the Maximum Concatenated Substring
Given a list of non-negative integer chunks (as strings), return the largest possible numeric string formed by concatenating them in some order.
def max_concat(chunks):
...
Solution Using cmp_to_key with the a + b > b + a comparator
from functools import cmp_to_key
def max_concat(chunks):
if not chunks:
return ''
ordered = sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a + b > b + a else 1))
result = ''.join(ordered)
# Handle edge case where all inputs are zero
return '0' if result.lstrip('0') == '' else result
Step-by-step trace (input: ['9', '30', '34', '5', '3']):
- Empty check — list non-empty, continue.
-
Build comparator —
cmp_to_key(lambda a, b: -1 if a + b > b + a else 1). -
Pairwise sort — applies the comparator across pairs:
-
'9' + '30' = '930'vs'30' + '9' = '309'→9first. -
'9' + '34' = '934'vs'34' + '9' = '349'→9first. -
'5' + '34' = '534'vs'34' + '5' = '345'→5first. -
'34' + '3' = '343'vs'3' + '34' = '334'→34first. -
'3' + '30' = '330'vs'30' + '3' = '303'→3first.
-
-
Sorted result —
['9', '5', '34', '3', '30']. -
Join —
'9534330'. -
Zero-strip check —
'9534330'.lstrip('0')is'9534330', non-empty, return as-is.
Output:
| return value |
|---|
'9534330' |
Why this works — concept by concept:
-
Pairwise comparator — the
a + b > b + atest on strings answers "if I had only these two chunks, which order produces the bigger concatenation"; this local greedy choice extends to a global total order. -
cmp_to_keyadapter — Python'ssortedtakes akey=callable, not acmp=callable;cmp_to_keybridges the API gap. - String concatenation comparison — comparing string concatenations bypasses the integer-vs-string mismatch; works for any chunk length.
-
''.join(ordered)—O(N)total concatenation; correct and idiomatic. -
Zero-strip edge case —
['0', '00']would produce'000'; convention says return'0'for all-zero inputs. -
O(N log N · L)time —N log Ncomparisons, eachO(L)string concat whereLis average chunk length;O(N · L)space for the sorted output.
PYTHON
Topic — sorting
Sorting problems
PYTHON
Topic — greedy
Greedy problems
Tips to Crack Cisco Data Engineering Interviews
SQL is not on this loop — skip it
The curated Cisco practice set is 4 Python, 0 SQL. Spending interview-prep time on SQL window functions for a Cisco data-engineering interview is a misallocation; spend it on dict comprehensions, status filters, decorators, and the greedy comparator-sort. If a SQL question appears, treat it as a different role family rather than the canonical loop covered here.
Drill the four Cisco Python primitives, in order
The four primitives in the curated Cisco Python practice set are dict-comprehension key-value inversion (#91), status-filter dict comprehension (#96), @functools.wraps decorator with time.perf_counter (#97), and cmp_to_key greedy comparator-sort (#287). Each maps to a specific module: vanilla dict for #91 and #96, functools + time for #97, functools.cmp_to_key for #287. Memorize the imports as muscle memory.
Decorator fluency is a Cisco signature
97 Decorator Timing is the make-or-break easy-tier round at Cisco. Candidates who can't fluently write @functools.wraps, *args, **kwargs, and time.perf_counter lose the decorator round. Drill the decorator practice page and the higher-order functions practice page until you can write the timing decorator from memory in under two minutes.
The maximum-substring greedy is the medium-tier ceiling
287 Maximum Substring is the only MEDIUM, and the trick is recognizing that lexicographic sort fails. Candidates who reach for sorted(chunks, reverse=True) lose the round; the right move is cmp_to_key(lambda a, b: -1 if a+b > b+a else 1). Drill the greedy practice page and the sorting practice page until the comparator pattern is automatic.
Easy-tier discipline matters
Three of four Cisco problems are EASY-tier. Easy at Cisco does not mean trivial; it means the interviewer expects zero hesitation, idiomatic Python, and an articulated invariant. A correct EASY answer with stuttering or an off-by-one is graded worse than a correct MEDIUM answer with the same bug, because the EASY signal is supposed to be "this candidate writes Python fluently." The Cisco easy practice page is the right warmup.
Where to practice on PipeCode
Start with the Cisco practice page for the curated 4-problem set. After that, drill the matching topic pages: dictionary, hash-table, decorator, higher-order-functions, sorting, greedy. The interview courses page bundles the Python course if you want a structured curriculum. For broad coverage, browse by topic.
Communication and approach under time pressure
Talk through the invariant first, the brute force second, and the optimal third — even if you go straight to the optimal in code. Interviewers grade process as much as the final answer. Leave 5 minutes at the end of each problem for an edge-case sweep: empty dict, missing fields, all-zero inputs, decorator on a function that raises. The most common "almost passed" failure mode is correct happy-path code that crashes on edge cases — a 30-second sweep prevents it.
Frequently Asked Questions
What is the Cisco data engineering interview process like?
Cisco's data engineering interview opens with a recruiter screen, then a technical phone screen with one Python coding problem, then an onsite (or virtual onsite) of three to four rounds: two Python coding rounds (dict comprehensions, status filters, decorators, greedy sorting), one data-modeling or system-design discussion (telemetry pipelines, network analytics), and one behavioral round. The coding rounds are heavy on the patterns covered in this guide.
What Python topics does Cisco test for data engineers?
Cisco Python interviews concentrate on four primitives that map directly to the curated 4-problem set: dict-comprehension key-value inversion (#91), filter dict-comprehension over nested status fields (#96), @functools.wraps decorator with time.perf_counter (#97), and cmp_to_key greedy comparator-sort (#287). Memorize the imports: from functools import wraps, cmp_to_key; import time; vanilla dict for the comprehensions.
How hard are Cisco data engineering interview questions?
The Cisco set is 3 easy + 1 medium + 0 hard. The MEDIUM problem (#287 Maximum Substring) hinges on recognizing that lexicographic sort fails and switching to the a + b > b + a greedy comparator. The three EASY problems test fundamentals: dict comprehension, filter comprehension, and decorator-with-functools.wraps. Stuttering on any EASY problem signals lack of fluency.
Are decorators and dict comprehensions common in Cisco interviews?
Yes — three of four Cisco problems test these patterns explicitly. #91 and #96 are dict comprehensions; #97 is decorators. If you cannot fluently write {v: k for k, v in d.items()}, {k: v for k, v in d.items() if v.get('status') == 'active'}, and a @functools.wraps decorator skeleton with *args, **kwargs, you cannot pass three of the four Cisco rounds. Drill these patterns until they are reflexive.
How many Cisco practice problems should I solve before the interview?
Solve all 4 problems on the Cisco practice page end-to-end — untimed first, then timed at 25 minutes per problem. After that, broaden to 20 to 30 additional Python problems spread across the matching topic pages: dictionary, hash-table, decorator, higher-order-functions, sorting, greedy. The Cisco Python practice page and the Cisco easy practice page are the right surfaces for the final week of prep.



Top comments (0)