<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Feng Zhang</title>
    <description>The latest articles on DEV Community by Feng Zhang (@feng_zhang_cedb4581bee881).</description>
    <link>https://dev.to/feng_zhang_cedb4581bee881</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875738%2Fd8a58adf-1466-4b32-9d75-041250f25bda.png</url>
      <title>DEV Community: Feng Zhang</title>
      <link>https://dev.to/feng_zhang_cedb4581bee881</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/feng_zhang_cedb4581bee881"/>
    <language>en</language>
    <item>
      <title>Product Metric Design And Diagnostic Deep Dives Explained — Tech Interview Concept (2026)</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Wed, 01 Jul 2026 14:25:38 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/product-metric-design-and-diagnostic-deep-dives-explained-tech-interview-concept-2026-50p8</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/product-metric-design-and-diagnostic-deep-dives-explained-tech-interview-concept-2026-50p8</guid>
      <description>&lt;p&gt;A product metric design interview is usually a test of judgment, not memorization. You get an ambiguous product or integrity problem, then you need to turn it into a measurement plan a real team could act on.&lt;/p&gt;

&lt;p&gt;The interviewer is checking whether you can keep these separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the product should optimize&lt;/li&gt;
&lt;li&gt;What the data can reliably observe&lt;/li&gt;
&lt;li&gt;What might be biased, gamed, or misleading&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article adapts the main ideas from PracHub's guide to &lt;a href="https://prachub.com/concepts/product-metric-design-and-diagnostic-deep-dives?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;product metric design and diagnostic investigations&lt;/a&gt;, with a focus on how to structure your answer in a Data Scientist interview.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the product goal, not the metric
&lt;/h2&gt;

&lt;p&gt;A weak answer starts with a list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DAU&lt;/li&gt;
&lt;li&gt;posts&lt;/li&gt;
&lt;li&gt;clicks&lt;/li&gt;
&lt;li&gt;retention&lt;/li&gt;
&lt;li&gt;reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds busy, but it does not explain what success means.&lt;/p&gt;

&lt;p&gt;A stronger answer starts by clarifying how the product is supposed to work. For example, if the prompt is "Define success metrics for a Circles feature," you might say:&lt;/p&gt;

&lt;p&gt;"I will treat Circles as a community product meant to deepen meaningful interaction among smaller groups. Success should be sustained, high-quality engagement without safety issues or notification fatigue."&lt;/p&gt;

&lt;p&gt;That short framing does a lot of work. It tells the interviewer you will not blindly optimize raw activity. A feature can create more posts and still make the product worse if those posts are low quality, spammy, or annoying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pick a north-star metric that maps to durable value
&lt;/h2&gt;

&lt;p&gt;A north-star metric should capture product value, not surface activity.&lt;/p&gt;

&lt;p&gt;For a community product like Circles, raw joins or raw posts are easy to inflate. Users may join once and never return. Creators may post low-effort content. Spam accounts may create noisy groups.&lt;/p&gt;

&lt;p&gt;A better primary metric could be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;weekly active circle members with meaningful two-sided interactions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then normalize it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;meaningful interactions / eligible circle members
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;meaningful interactions / eligible impressions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The denominator matters because each version answers a different question.&lt;/p&gt;

&lt;p&gt;Per-member metrics ask whether members are getting value. Per-impression metrics ask whether exposed content creates useful engagement. Per-session metrics ask whether Circles changes behavior during active use. Raw counts hide these differences.&lt;/p&gt;

&lt;p&gt;For a B2B chat product, the north-star metric might be qualified conversation starts, not total messages. A qualified conversation could require both parties to participate, or require that the conversation passes a basic quality threshold.&lt;/p&gt;

&lt;p&gt;Define the unit of value before you define the count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a metric tree
&lt;/h2&gt;

&lt;p&gt;A metric tree helps you avoid treating metric design as a bag of unrelated numbers.&lt;/p&gt;

&lt;p&gt;A useful structure is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Outcome metric&lt;/li&gt;
&lt;li&gt;Input metrics&lt;/li&gt;
&lt;li&gt;Diagnostic metrics&lt;/li&gt;
&lt;li&gt;Guardrails&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For B2B chat, that might look like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Example metrics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Outcome&lt;/td&gt;
&lt;td&gt;Qualified conversation starts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inputs&lt;/td&gt;
&lt;td&gt;Response rate, time-to-first-response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diagnostics&lt;/td&gt;
&lt;td&gt;Exposure rate, click-through rate, reply depth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guardrails&lt;/td&gt;
&lt;td&gt;Blocks, spam reports, opt-outs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This structure lets you explain why a metric moved.&lt;/p&gt;

&lt;p&gt;If qualified conversations dropped, maybe fewer users saw the entry point. Maybe users clicked but did not send messages. Maybe messages were sent, but businesses stopped responding. Each diagnosis points to a different product issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use guardrails to block bad launches
&lt;/h2&gt;

&lt;p&gt;A positive primary metric does not mean the launch is safe.&lt;/p&gt;

&lt;p&gt;Guardrail metrics protect user experience, integrity, and ecosystem health. Common guardrails include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hide rate&lt;/li&gt;
&lt;li&gt;report rate&lt;/li&gt;
&lt;li&gt;block rate&lt;/li&gt;
&lt;li&gt;unfollow rate&lt;/li&gt;
&lt;li&gt;session length&lt;/li&gt;
&lt;li&gt;notification opt-outs&lt;/li&gt;
&lt;li&gt;harmful-content prevalence&lt;/li&gt;
&lt;li&gt;advertiser complaints&lt;/li&gt;
&lt;li&gt;support contacts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Circles, guardrails might include mute rate, leave rate, reports, blocks, notification opt-outs, and displacement from broader feed engagement.&lt;/p&gt;

&lt;p&gt;That last one is easy to miss. A feature may increase activity inside Circles while reducing healthy engagement elsewhere. If the new product fragments the social graph or pushes spammy invites, the top-line metric may look good while the broader product gets worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohort before you trust the average
&lt;/h2&gt;

&lt;p&gt;Averages can hide the real story.&lt;/p&gt;

&lt;p&gt;Cut the results by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new vs existing users&lt;/li&gt;
&lt;li&gt;market&lt;/li&gt;
&lt;li&gt;device class&lt;/li&gt;
&lt;li&gt;language&lt;/li&gt;
&lt;li&gt;creator size&lt;/li&gt;
&lt;li&gt;business type&lt;/li&gt;
&lt;li&gt;group size&lt;/li&gt;
&lt;li&gt;spam-risk tier&lt;/li&gt;
&lt;li&gt;prior engagement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a Meta-style interview, you should ask whether gains are broad-based or concentrated in a small segment. For example, Circles may help highly connected users while doing little for new users. A B2B chat change may help large businesses but hurt smaller ones that cannot respond quickly.&lt;/p&gt;

&lt;p&gt;This is also where fairness and integrity concerns enter the answer. A harmful-content system that reduces measured prevalence overall may still perform poorly for a language group with weaker labels or lower reviewer coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Match attribution windows to the product mechanism
&lt;/h2&gt;

&lt;p&gt;The time window should match how value appears.&lt;/p&gt;

&lt;p&gt;A chat product may need same-day response metrics and 7-day retention. A community product may need 14-day or 28-day return behavior. Harmful-content outcomes may need delayed labels because review, appeals, and classifier updates take time.&lt;/p&gt;

&lt;p&gt;A window that is too short misses downstream value. A window that is too long adds noise and confounding.&lt;/p&gt;

&lt;p&gt;Say this explicitly in the interview. It shows that you understand measurement as a product decision, not just a query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choose the right randomization unit
&lt;/h2&gt;

&lt;p&gt;Experiment design starts with the unit of randomization.&lt;/p&gt;

&lt;p&gt;User-level randomization works when the experience is isolated. Networked products are harder. For communities, pages, advertisers, threads, or circles, users interact with each other. One user's treatment can affect another user's experience.&lt;/p&gt;

&lt;p&gt;That means you may need community-level, page-level, advertiser-level, or thread-level randomization.&lt;/p&gt;

&lt;p&gt;You should also define the estimand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;direct effect&lt;/li&gt;
&lt;li&gt;spillover effect&lt;/li&gt;
&lt;li&gt;total ecosystem effect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, if some Circle members receive a new invite flow and others do not, their behavior may interact. A user-level A/B test may underestimate or distort the effect if treated and control users are in the same groups.&lt;/p&gt;

&lt;p&gt;If randomization is not possible, you can propose a retrospective cohort design with matching or difference-in-differences. Keep the caveat clear: observational methods need stronger assumptions about confounding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Think about power, especially for rare events
&lt;/h2&gt;

&lt;p&gt;Rare events are hard to measure. Spam exposure, harmful-content reports, severe abuse, and appeals may have very low base rates.&lt;/p&gt;

&lt;p&gt;A rough minimum detectable effect relationship is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MDE ≈ (z_alpha/2 + z_beta) * sqrt(2 * sigma^2 / n)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The takeaway is that smaller effects, noisier metrics, and rare events need more data.&lt;/p&gt;

&lt;p&gt;For low base-rate outcomes, you can consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;aggregated exposure units&lt;/li&gt;
&lt;li&gt;longer test duration&lt;/li&gt;
&lt;li&gt;stratification&lt;/li&gt;
&lt;li&gt;higher-signal proxy labels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not promise that a short test can detect rare harm reliably. That is exactly the kind of overconfidence interviewers watch for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat proxy metrics with suspicion
&lt;/h2&gt;

&lt;p&gt;Proxy metrics are useful because they are often fast and available. They are also dangerous.&lt;/p&gt;

&lt;p&gt;For harmful content, user reports are visible and timely. But reports are biased by user awareness, culture, language, and reporting propensity. More reports could mean more harm, better reporting UX, higher user awareness, or more total usage.&lt;/p&gt;

&lt;p&gt;Reports are not ground truth.&lt;/p&gt;

&lt;p&gt;A stronger harmful-content evaluation combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user reports&lt;/li&gt;
&lt;li&gt;human review labels&lt;/li&gt;
&lt;li&gt;classifier scores&lt;/li&gt;
&lt;li&gt;prevalence estimates&lt;/li&gt;
&lt;li&gt;severity-weighted harm metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Severity matters. Counting all violations equally treats mild spam and severe abuse as the same kind of event.&lt;/p&gt;

&lt;p&gt;A better metric is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;severity-weighted prevalence =
sum(exposures_i * severity_i) / total eligible exposures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The severity buckets should be transparent, and calibration checks should verify that labels are consistent enough to support decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use a diagnostic funnel for investigations
&lt;/h2&gt;

&lt;p&gt;When a metric moves, avoid guessing. Use a funnel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;exposure -&amp;gt; action -&amp;gt; quality -&amp;gt; retention -&amp;gt; harm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a product metric drops, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did fewer users become eligible?&lt;/li&gt;
&lt;li&gt;Did fewer users see the surface?&lt;/li&gt;
&lt;li&gt;Did fewer users act after exposure?&lt;/li&gt;
&lt;li&gt;Did the quality of actions change?&lt;/li&gt;
&lt;li&gt;Did retention move?&lt;/li&gt;
&lt;li&gt;Did harm or negative feedback move?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the answer analytical. It also mirrors how real product teams debug launches.&lt;/p&gt;

&lt;p&gt;Before interpreting a movement, check measurement validity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logging coverage&lt;/li&gt;
&lt;li&gt;denominator definitions&lt;/li&gt;
&lt;li&gt;duplicate events&lt;/li&gt;
&lt;li&gt;bot or spam filtering&lt;/li&gt;
&lt;li&gt;experiment balance&lt;/li&gt;
&lt;li&gt;sample-ratio mismatch&lt;/li&gt;
&lt;li&gt;missing labels&lt;/li&gt;
&lt;li&gt;metric backfills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not need to design the ingestion system in a product metric interview. You do need to know when the measurement is untrustworthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  A compact answer pattern for interviews
&lt;/h2&gt;

&lt;p&gt;For a metric design prompt, use this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clarify the product goal.&lt;/li&gt;
&lt;li&gt;State assumptions.&lt;/li&gt;
&lt;li&gt;Define the primary metric.&lt;/li&gt;
&lt;li&gt;Add supporting funnel metrics.&lt;/li&gt;
&lt;li&gt;Add guardrails.&lt;/li&gt;
&lt;li&gt;Discuss cohorts and denominators.&lt;/li&gt;
&lt;li&gt;Explain experiment design.&lt;/li&gt;
&lt;li&gt;Name likely diagnostics if the result moves.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Circles, that might be:&lt;/p&gt;

&lt;p&gt;"Success is weekly active circle members with meaningful two-sided interactions, normalized by eligible members. I would support that with circle creation, invite acceptance, posting, comment depth, repeat participation, and 7-day or 28-day retention. Guardrails would include mutes, leaves, reports, blocks, notification opt-outs, and displacement from broader feed engagement. I would run an A/B test if possible, with user-level or circle-level assignment depending on spillovers. I would cut by new users, highly connected users, small markets, and baseline sharing behavior."&lt;/p&gt;

&lt;p&gt;That is a defensible answer because it ties metrics to the decision the team needs to make.&lt;/p&gt;

&lt;p&gt;If you want more prompts to practice this style, PracHub has a set of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;data science and product interview questions&lt;/a&gt;. For the full concept breakdown, use the original PracHub guide on &lt;a href="https://prachub.com/concepts/product-metric-design-and-diagnostic-deep-dives?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;product metric design and diagnostic investigations&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>analyticsexperimentation</category>
      <category>programming</category>
    </item>
    <item>
      <title>Amazon Machine Learning Engineer Interview Cheatsheet 2026</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Wed, 24 Jun 2026 14:26:21 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/amazon-machine-learning-engineer-interview-cheatsheet-2026-2n5k</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/amazon-machine-learning-engineer-interview-cheatsheet-2026-2n5k</guid>
      <description>&lt;p&gt;If you are preparing for an Amazon Machine Learning Engineer interview, expect more than "I know the model." You need to explain how the model works, how you would implement it, what breaks in production, and how you would decide whether it is good enough to ship.&lt;/p&gt;

&lt;p&gt;The longer &lt;a href="https://prachub.com/interview-prep/amazon-machine-learning-engineer-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;PracHub Amazon Machine Learning Engineer interview prep guide&lt;/a&gt; breaks this down by interview stage. This article condenses the highest-signal areas into a study guide you can use before a technical screen or onsite.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Amazon is likely testing
&lt;/h2&gt;

&lt;p&gt;For an MLE role, Amazon interviewers usually care about four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you reason from ML theory to working code?&lt;/li&gt;
&lt;li&gt;Can you design systems that train, evaluate, and serve models reliably?&lt;/li&gt;
&lt;li&gt;Can you debug models using metrics, data, and experiments?&lt;/li&gt;
&lt;li&gt;Can you explain tradeoffs around latency, cost, memory, and quality?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good answer does not stop at "use a Transformer" or "train XGBoost." You should be able to talk through tensor shapes, masks, evaluation gaps, distributed training, sparse data, online metrics, and deployment risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transformers: know the internals, not just the vocabulary
&lt;/h2&gt;

&lt;p&gt;Transformers are one of the highest-yield topics for an Amazon MLE interview. Be ready to explain scaled dot-product attention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attention(Q, K, V) = softmax((QK^T / sqrt(d_k)) + M)V
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;M&lt;/code&gt; is often an additive mask. Allowed positions get &lt;code&gt;0&lt;/code&gt;; blocked positions get &lt;code&gt;-inf&lt;/code&gt;. The &lt;code&gt;sqrt(d_k)&lt;/code&gt; scaling keeps attention logits from getting too large and saturating the softmax.&lt;/p&gt;

&lt;p&gt;For implementation questions, shape reasoning matters. Given input &lt;code&gt;X&lt;/code&gt; with shape &lt;code&gt;B x T x d_model&lt;/code&gt;, multi-head attention projects it into &lt;code&gt;Q&lt;/code&gt;, &lt;code&gt;K&lt;/code&gt;, and &lt;code&gt;V&lt;/code&gt;, then reshapes them into something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;B x num_heads x T x head_dim
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attention score tensor then has shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;B x num_heads x T x T
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A common bug is reshaping after a transpose without handling non-contiguous tensors. In PyTorch, that means knowing when &lt;code&gt;.view()&lt;/code&gt; can break and when &lt;code&gt;.reshape()&lt;/code&gt; or &lt;code&gt;.contiguous()&lt;/code&gt; is safer.&lt;/p&gt;

&lt;p&gt;For decoder-only models, causal masking is mandatory. Token &lt;code&gt;t&lt;/code&gt; can only attend to positions &lt;code&gt;&amp;lt;= t&lt;/code&gt;. If you forget this, the model can leak future labels during training. The loss may look great, but generation will fail.&lt;/p&gt;

&lt;p&gt;You should also know the standard GPT-style block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = x + attention(LayerNorm(x))
x = x + MLP(LayerNorm(x))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pre-norm layout is common because it helps gradient flow in deeper models. Post-norm matches the original Transformer pattern, but can be harder to train at scale.&lt;/p&gt;

&lt;p&gt;LayerNorm is another frequent follow-up. It normalizes across the hidden dimension for each token independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LN(x) = gamma * (x - mean) / sqrt(variance + epsilon) + beta
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unlike BatchNorm, LayerNorm does not depend on batch statistics. That helps with variable batch sizes, sequence models, and autoregressive inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLMs: connect architecture to operations
&lt;/h2&gt;

&lt;p&gt;For LLM questions, you need to move between model internals and production behavior.&lt;/p&gt;

&lt;p&gt;A strong answer covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decoder-only Transformer architecture&lt;/li&gt;
&lt;li&gt;Tokenization with BPE, WordPiece, or SentencePiece&lt;/li&gt;
&lt;li&gt;Pretraining with next-token prediction&lt;/li&gt;
&lt;li&gt;Instruction tuning with prompt-response data&lt;/li&gt;
&lt;li&gt;Preference alignment methods such as RLHF or DPO&lt;/li&gt;
&lt;li&gt;Fine-tuning choices such as full fine-tuning, LoRA, QLoRA, prefix tuning, and prompt tuning&lt;/li&gt;
&lt;li&gt;Evaluation beyond perplexity&lt;/li&gt;
&lt;li&gt;Serving constraints such as KV cache memory, throughput, and p99 latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Perplexity is useful, but it is not enough. It measures next-token likelihood, not whether the model follows instructions, refuses unsafe requests correctly, produces grounded answers, or gives useful task outputs.&lt;/p&gt;

&lt;p&gt;For a validation-system design question, structure your answer around:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Evaluation data&lt;br&gt;&lt;br&gt;
Use golden prompts, task-specific benchmarks, adversarial sets, regression cases from past failures, and production-like prompts sampled in a privacy-safe way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Metrics&lt;br&gt;&lt;br&gt;
Include exact match where it fits, rubric scores, human preference win rate, hallucination or groundedness for RAG, toxicity or safety rates, refusal correctness, latency &lt;code&gt;p50/p95/p99&lt;/code&gt;, tokens per second, and cost per request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;System components&lt;br&gt;&lt;br&gt;
Mention a model registry, prompt/version registry, evaluation runner, deterministic inference harness, result store, dashboard, and deployment gates.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Online validation&lt;br&gt;&lt;br&gt;
Use shadow tests, canary rollout, alerts for regressions, drift checks, and rollback criteria.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the system is RAG-based, model quality depends on more than weights. Retrieval, chunking, embedding quality, ranking, prompt assembly, citation grounding, and index freshness all matter. Good evaluation should include retrieval recall@k, answer faithfulness, source attribution, and latency budget split across retrieval and generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  MoE: sparse compute has systems costs
&lt;/h2&gt;

&lt;p&gt;Mixture-of-Experts models often replace dense MLP layers with multiple expert networks and a learned router. A token may be sent to the top-1 or top-2 experts.&lt;/p&gt;

&lt;p&gt;The benefit is that the model can have more total parameters without activating all of them for every token. The cost is systems complexity.&lt;/p&gt;

&lt;p&gt;In an interview, avoid saying "MoE is more efficient" without explaining the tradeoff. Good answers mention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load-balancing losses&lt;/li&gt;
&lt;li&gt;Expert collapse risk&lt;/li&gt;
&lt;li&gt;Capacity factors&lt;/li&gt;
&lt;li&gt;Token dropping during overload&lt;/li&gt;
&lt;li&gt;Distributed &lt;code&gt;all-to-all&lt;/code&gt; communication&lt;/li&gt;
&lt;li&gt;Harder batching because routing is data-dependent&lt;/li&gt;
&lt;li&gt;Higher risk around p99 latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dense models are simpler to serve. MoE models can scale parameter count better relative to FLOPs, but routing and communication make training and serving harder.&lt;/p&gt;

&lt;h2&gt;
  
  
  XGBoost: understand why it is fast
&lt;/h2&gt;

&lt;p&gt;Amazon MLE interviews may still test classic ML, especially for tabular problems. XGBoost is a common topic because it mixes algorithm knowledge with systems thinking.&lt;/p&gt;

&lt;p&gt;Gradient boosting builds an additive model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;y_hat_i^(t) = y_hat_i^(t-1) + eta * f_t(x_i)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each new tree fits the residual signal, often framed as the negative gradient of the loss. This means boosting rounds are sequential. Tree &lt;code&gt;t&lt;/code&gt; depends on predictions from earlier trees.&lt;/p&gt;

&lt;p&gt;The parallelism is inside each tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;across candidate splits&lt;/li&gt;
&lt;li&gt;across features&lt;/li&gt;
&lt;li&gt;across data partitions&lt;/li&gt;
&lt;li&gt;across histogram bins&lt;/li&gt;
&lt;li&gt;across workers in distributed training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;XGBoost uses second-order information. Split scoring uses gradients and Hessians, with regularization terms such as &lt;code&gt;lambda&lt;/code&gt; and &lt;code&gt;gamma&lt;/code&gt;. You do not need to derive every line from memory, but you should be able to explain that XGBoost uses both first and second derivatives to score split quality.&lt;/p&gt;

&lt;p&gt;For large datasets, exact split search can be expensive. Histogram-based split finding buckets continuous values into quantile bins, often far fewer than the number of raw thresholds. Workers build local histograms of gradient and Hessian sums, then reduce them. This gives better cache behavior and lower memory use, with some loss in split precision.&lt;/p&gt;

&lt;p&gt;Also know why sparse handling matters. XGBoost learns a default direction for missing values, which helps with sparse one-hot data and missing feature values.&lt;/p&gt;

&lt;h2&gt;
  
  
  PyTorch implementation questions: be concrete
&lt;/h2&gt;

&lt;p&gt;For "Implement a decoder-only GPT-style Transformer," start by clarifying scope:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Should I implement a minimal PyTorch module with embeddings, positional encoding, masked multi-head attention, MLP blocks, and logits, or should I include training and generation too?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then state assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input token IDs have shape &lt;code&gt;B x T&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Vocabulary size is &lt;code&gt;V&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Embedding dimension is &lt;code&gt;C&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Number of heads divides &lt;code&gt;C&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Output logits have shape &lt;code&gt;B x T x V&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Talk through token embeddings, positional embeddings or RoPE, stacked pre-norm blocks, causal masking, output projection, and loss.&lt;/p&gt;

&lt;p&gt;Call out edge cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;T&lt;/code&gt; exceeds configured context length&lt;/li&gt;
&lt;li&gt;mask broadcasting is wrong&lt;/li&gt;
&lt;li&gt;train/eval dropout behavior differs&lt;/li&gt;
&lt;li&gt;causal mask is missing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.view()&lt;/code&gt; is used on a non-contiguous tensor&lt;/li&gt;
&lt;li&gt;generation lacks a KV cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good implementation answer includes unit tests for shape, causal leakage, and a tiny overfit test to verify the model can learn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Behavioral answers still need metrics
&lt;/h2&gt;

&lt;p&gt;The source guide groups behavioral preparation under leadership principles, ownership, and measurable impact. For Amazon, that phrasing matters.&lt;/p&gt;

&lt;p&gt;Do not give vague stories like "I improved model performance." Give the situation, your decision, the tradeoff, the result, and the metric. For an MLE, strong stories often include model quality, latency, cost, reliability, data quality, rollback decisions, or experiment design.&lt;/p&gt;

&lt;p&gt;For example, a better answer sounds like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We had a relevance regression after a feature pipeline change. I traced the issue to offline/online feature mismatch, added validation checks before promotion, and reduced bad launches in that area."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The exact numbers depend on your experience, but the structure should make your ownership clear.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final prep checklist
&lt;/h2&gt;

&lt;p&gt;Before the interview, make sure you can answer these without notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Derive and explain scaled dot-product attention&lt;/li&gt;
&lt;li&gt;Trace Transformer tensor shapes through multi-head attention&lt;/li&gt;
&lt;li&gt;Explain causal masking and label leakage&lt;/li&gt;
&lt;li&gt;Compare LayerNorm and BatchNorm&lt;/li&gt;
&lt;li&gt;Discuss KV cache memory and autoregressive latency&lt;/li&gt;
&lt;li&gt;Explain why perplexity is not enough for LLM evaluation&lt;/li&gt;
&lt;li&gt;Design an LLM validation system with offline and online gates&lt;/li&gt;
&lt;li&gt;Explain MoE routing and serving tradeoffs&lt;/li&gt;
&lt;li&gt;Explain XGBoost histogram split finding and boosting-round dependency&lt;/li&gt;
&lt;li&gt;Write a minimal PyTorch Transformer block&lt;/li&gt;
&lt;li&gt;Tie every model choice to quality, latency, cost, or reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to drill with targeted prompts, the &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;PracHub interview questions library&lt;/a&gt; has practice questions across ML theory, system design, coding, and behavioral topics.&lt;/p&gt;

&lt;p&gt;For the full role-specific breakdown, use the &lt;a href="https://prachub.com/interview-prep/amazon-machine-learning-engineer-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Amazon Machine Learning Engineer interview prep guide on PracHub&lt;/a&gt; as your main checklist.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>amazon</category>
      <category>machinelearningengineer</category>
    </item>
    <item>
      <title>Notifications And Lifecycle Engagement Explained — Tech Interview Concept (2026)</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Wed, 17 Jun 2026 14:26:03 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/notifications-and-lifecycle-engagement-explained-tech-interview-concept-2026-54bg</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/notifications-and-lifecycle-engagement-explained-tech-interview-concept-2026-54bg</guid>
      <description>&lt;p&gt;Notifications are easy to measure badly.&lt;/p&gt;

&lt;p&gt;If a push campaign gets more clicks, did it create real engagement, or did it interrupt people who were already likely to open the app? If a ranking model lifts CTR, did it improve relevance, or did it learn to send curiosity bait? If dormant users come back today, do they stick around next month, or do they disable notifications?&lt;/p&gt;

&lt;p&gt;That is what interviewers are getting at when they ask about notification and lifecycle engagement metrics. The original PracHub concept post on &lt;a href="https://prachub.com/concepts/notifications-and-lifecycle-engagement?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Notifications and Lifecycle Engagement&lt;/a&gt; covers the concept, but this article focuses on the answer pattern you can use in a data science or product analytics interview.&lt;/p&gt;

&lt;h2&gt;
  
  
  What interviewers are really testing
&lt;/h2&gt;

&lt;p&gt;A weak answer sounds like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I would track CTR, opens, DAU, and retention."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a metric list, not an evaluation plan.&lt;/p&gt;

&lt;p&gt;A stronger answer explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What product decision the metric supports&lt;/li&gt;
&lt;li&gt;What the primary outcome is&lt;/li&gt;
&lt;li&gt;Which metrics are drivers&lt;/li&gt;
&lt;li&gt;Which metrics are guardrails&lt;/li&gt;
&lt;li&gt;How the experiment estimates incremental impact&lt;/li&gt;
&lt;li&gt;How you handle fatigue, delayed outcomes, and selection bias&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a company like Meta, notifications can bring users back and help build habits. They can also annoy users, increase opt-outs, and reduce long-term trust. The interview is testing whether you can separate short-term movement from durable product value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the product goal
&lt;/h2&gt;

&lt;p&gt;Before naming metrics, clarify the goal.&lt;/p&gt;

&lt;p&gt;A notification system may be trying to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reactivate dormant users&lt;/li&gt;
&lt;li&gt;Improve relevance of pushes&lt;/li&gt;
&lt;li&gt;Increase marketplace actions&lt;/li&gt;
&lt;li&gt;Reduce notification fatigue&lt;/li&gt;
&lt;li&gt;Test a new ranking or sending policy&lt;/li&gt;
&lt;li&gt;Personalize volume caps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right metric depends on the goal. A reactivation system may care about &lt;code&gt;D7_retained_active_users&lt;/code&gt;. A marketplace notification may care about &lt;code&gt;listing_detail_views&lt;/code&gt;, &lt;code&gt;saves&lt;/code&gt;, or &lt;code&gt;seller_messages&lt;/code&gt;. A fatigue-reduction project may use &lt;code&gt;disable_push_rate&lt;/code&gt; or &lt;code&gt;mute_rate&lt;/code&gt; as the primary outcome.&lt;/p&gt;

&lt;p&gt;This step matters because it ties measurement to an actual product decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a metric hierarchy
&lt;/h2&gt;

&lt;p&gt;For notification experiments, organize metrics into three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Primary metric
&lt;/h3&gt;

&lt;p&gt;This is the metric you would use to make the launch decision.&lt;/p&gt;

&lt;p&gt;Good candidates include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;D7_retained_active_users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;D28_retained_active_users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Incremental &lt;code&gt;sessions_per_user&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;meaningful_sessions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Downstream actions such as messages, purchases, comments, or listing views&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The primary metric should capture user or business value, not raw notification volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Driver metrics
&lt;/h3&gt;

&lt;p&gt;These explain why the primary metric moved.&lt;/p&gt;

&lt;p&gt;Common driver metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;notification_open_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification_click_through_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_starts_from_notification&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification-attributed_sessions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;downstream_conversion&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CTR is useful here, but it should rarely be the launch metric.&lt;/p&gt;

&lt;p&gt;The formula is simple:&lt;/p&gt;

&lt;p&gt;$$CTR = \frac{\text{notification clicks}}{\text{notifications delivered}}$$&lt;/p&gt;

&lt;p&gt;The problem is what CTR rewards. It can favor clickbait, curiosity, or over-targeting users who were already active. A notification that gets many clicks may still reduce retention if users feel spammed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Guardrail metrics
&lt;/h3&gt;

&lt;p&gt;Guardrails protect user trust and long-term health.&lt;/p&gt;

&lt;p&gt;Use metrics such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;disable_push_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mute_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;uninstall_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hide_notification_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;negative_feedback_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notifications_sent_per_user&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;complaints&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Quality metrics like &lt;code&gt;meaningful_interactions_per_session&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A treatment that lifts DAU but also raises push disables may be a bad trade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design the experiment around assignment, not clicks
&lt;/h2&gt;

&lt;p&gt;For most notification policies, randomize at the user level.&lt;/p&gt;

&lt;p&gt;Control users get the existing policy. Treatment users are eligible for the new notification policy, ranking model, or sending rule.&lt;/p&gt;

&lt;p&gt;The treatment should not be defined as "clicked a notification" or "received a notification." Those are post-treatment events. If you analyze only users who clicked, you introduce selection bias because the treatment itself affects who receives, sees, and clicks notifications.&lt;/p&gt;

&lt;p&gt;Use intent-to-treat analysis as the primary estimate:&lt;/p&gt;

&lt;p&gt;$$ITT = E[Y \mid Z=1] - E[Y \mid Z=0]$$&lt;/p&gt;

&lt;p&gt;Here, &lt;code&gt;Z&lt;/code&gt; is assignment to treatment.&lt;/p&gt;

&lt;p&gt;This estimates the effect of being assigned to the new policy. Some assigned users may never receive a notification during the experiment. That is fine. ITT matches the product decision: should we launch this policy to eligible users?&lt;/p&gt;

&lt;p&gt;You can report treatment-on-treated as a secondary diagnostic, but be careful. If exposure is affected by the treatment, exposed-user analysis can be misleading. If needed, use exposure rates or instrumental variables, with clear caveats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch for interference
&lt;/h2&gt;

&lt;p&gt;User-level randomization works well for many notification systems, but social notifications can create spillovers.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Your friend commented"&lt;/li&gt;
&lt;li&gt;"Someone tagged you"&lt;/li&gt;
&lt;li&gt;"A creator you follow posted"&lt;/li&gt;
&lt;li&gt;Marketplace messages tied to listings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One user's treatment can generate messages or activity that affects another user. That violates SUTVA, the assumption that one unit's treatment does not affect another unit's outcome.&lt;/p&gt;

&lt;p&gt;In those cases, consider cluster randomization. The cluster could be a conversation, household, creator-follower graph component, marketplace listing neighborhood, or another unit that captures likely spillovers.&lt;/p&gt;

&lt;p&gt;Cluster experiments need more sample size because observations inside a cluster are correlated. The design effect is:&lt;/p&gt;

&lt;p&gt;$$DE = 1 + (m-1)\rho$$&lt;/p&gt;

&lt;p&gt;Here, &lt;code&gt;m&lt;/code&gt; is cluster size and &lt;code&gt;rho&lt;/code&gt; is the intra-cluster correlation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plan for small effects
&lt;/h2&gt;

&lt;p&gt;Retention and opt-out effects can be small, so power matters.&lt;/p&gt;

&lt;p&gt;For a two-sample comparison of means, a rough sample size formula is:&lt;/p&gt;

&lt;p&gt;$$n \approx \frac{2\sigma^2(z_{1-\alpha/2}+z_{1-\beta})^2}{\delta^2}$$&lt;/p&gt;

&lt;p&gt;Here, &lt;code&gt;delta&lt;/code&gt; is the minimum detectable effect. For binary metrics, use &lt;code&gt;p(1-p)&lt;/code&gt; as the variance.&lt;/p&gt;

&lt;p&gt;If you have strong pre-period behavior, use CUPED or regression adjustment to reduce variance. Good covariates include pre-experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;sessions_per_user&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification_clicks&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;active_days&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The covariates must be measured before assignment. This improves sensitivity without changing the estimand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Segment by lifecycle stage
&lt;/h2&gt;

&lt;p&gt;Notification impact is rarely uniform.&lt;/p&gt;

&lt;p&gt;Analyze cohorts such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New users&lt;/li&gt;
&lt;li&gt;Dormant users&lt;/li&gt;
&lt;li&gt;Power users&lt;/li&gt;
&lt;li&gt;Notification-heavy users&lt;/li&gt;
&lt;li&gt;Low-intent users&lt;/li&gt;
&lt;li&gt;Users with prior disables or mutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A policy may help dormant users come back while annoying already-active users. A broad average can hide that pattern.&lt;/p&gt;

&lt;p&gt;This is especially relevant for lifecycle engagement. The same push can feel helpful to one user and spammy to another. Segment analysis can support personalization, caps, or targeted rollout instead of a full launch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measure beyond the first click
&lt;/h2&gt;

&lt;p&gt;Notifications often have delayed costs.&lt;/p&gt;

&lt;p&gt;Common patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTR rises, but &lt;code&gt;disable_push_rate&lt;/code&gt; rises later&lt;/li&gt;
&lt;li&gt;DAU increases, but &lt;code&gt;D28_retention&lt;/code&gt; falls&lt;/li&gt;
&lt;li&gt;Sessions increase, but session quality drops&lt;/li&gt;
&lt;li&gt;Short-term reactivation fades after users habituate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a window that matches the product goal. If the goal is reactivation, D1 may be too short. D7 or D28 can show whether users came back again. For long-term fatigue, use longer experiments, staggered rollouts, or holdouts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Control multiple testing
&lt;/h2&gt;

&lt;p&gt;Notification systems have many surfaces, cohorts, and outcomes. If you slice enough, some result will look significant by chance.&lt;/p&gt;

&lt;p&gt;A clean answer says you would predefine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primary metric&lt;/li&gt;
&lt;li&gt;Main guardrails&lt;/li&gt;
&lt;li&gt;Evaluation window&lt;/li&gt;
&lt;li&gt;High-risk cohorts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many exploratory slices, use false discovery control such as Benjamini-Hochberg. For guardrails where false positives or false negatives are costly, Bonferroni correction may be more appropriate.&lt;/p&gt;

&lt;p&gt;If you want more practice with these interview-style pivots, PracHub has related &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;data science and product interview questions&lt;/a&gt; that cover experimentation, metrics, ranking, and causal inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worked example: general notification policy
&lt;/h2&gt;

&lt;p&gt;Suppose the prompt is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Define metrics and design experiments for notifications."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A strong answer could sound like this:&lt;/p&gt;

&lt;p&gt;First, clarify the goal. Assume we are testing a new push notification ranking policy for a social app. The goal is to increase meaningful engagement and retention without increasing fatigue.&lt;/p&gt;

&lt;p&gt;Primary metric:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;D7_retained_active_users&lt;/code&gt; or incremental &lt;code&gt;sessions_per_user&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Driver metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;notification_open_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification-attributed_sessions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;downstream_actions&lt;/code&gt;, such as comments, messages, or shares&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;disable_push_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mute_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;uninstall_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notifications_sent_per_user&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;negative_feedback_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;meaningful_interactions_per_session&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Randomize eligible users into control and treatment&lt;/li&gt;
&lt;li&gt;Control keeps the current policy&lt;/li&gt;
&lt;li&gt;Treatment is eligible for the new ranking or sending policy&lt;/li&gt;
&lt;li&gt;Analyze by assignment using ITT&lt;/li&gt;
&lt;li&gt;Avoid conditioning on users who clicked or received a notification&lt;/li&gt;
&lt;li&gt;Use cluster randomization if social spillovers are strong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Decision rule:&lt;/p&gt;

&lt;p&gt;Ship only if the primary metric improves and guardrails do not show statistically or practically meaningful harm. If the lift is concentrated in dormant users but guardrail harm appears among power users, consider personalization or volume caps instead of a full rollout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worked example: similar-listing notifications
&lt;/h2&gt;

&lt;p&gt;Now suppose the prompt is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"How would you evaluate a similar-listing notification feature?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The product goal is narrower. You want to know whether notifying users about similar marketplace listings helps them find relevant items without feeling spammed.&lt;/p&gt;

&lt;p&gt;Primary metrics could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incremental &lt;code&gt;listing_detail_views&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;saves&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;seller_messages&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Purchase-intent actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;notification_disable_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hide_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Lower engagement with future marketplace notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Randomize eligible users who viewed or saved a listing. Do not randomize only people who receive the notification, because eligibility is part of the treatment.&lt;/p&gt;

&lt;p&gt;Use an evaluation window that includes delayed actions. A user may click today but message a seller two days later. Segment by intent strength too. Recent searchers may benefit, while casual browsers may find the same push irrelevant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The common traps
&lt;/h2&gt;

&lt;p&gt;Avoid these mistakes in an interview:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing only for CTR&lt;/li&gt;
&lt;li&gt;Analyzing only users who opened the notification&lt;/li&gt;
&lt;li&gt;Listing metrics without a launch rule&lt;/li&gt;
&lt;li&gt;Ignoring opt-outs, mutes, and uninstalls&lt;/li&gt;
&lt;li&gt;Treating all lifecycle cohorts the same&lt;/li&gt;
&lt;li&gt;Missing network spillovers in social notifications&lt;/li&gt;
&lt;li&gt;Reading too much into short-term DAU lift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good interview answer is causal, decision-oriented, and honest about tradeoffs. You are not trying to prove notifications work. You are trying to estimate whether a specific policy creates incremental value without damaging retention or trust.&lt;/p&gt;

&lt;p&gt;For a more compact interview-prep version of this framework, use the PracHub write-up on &lt;a href="https://prachub.com/concepts/notifications-and-lifecycle-engagement?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Notifications and Lifecycle Engagement&lt;/a&gt; as a reference before practicing mock answers.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>programming</category>
      <category>tech</category>
    </item>
    <item>
      <title>Uber Data Scientist Interview Cheatsheet 2026</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Wed, 10 Jun 2026 14:25:36 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/uber-data-scientist-interview-cheatsheet-2026-fal</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/uber-data-scientist-interview-cheatsheet-2026-fal</guid>
      <description>&lt;p&gt;If you're preparing for an Uber Data Scientist interview, the hard part is not memorizing formulas. It is knowing how Uber frames data science problems: marketplace effects, experiment validity, ETA quality, and metric definitions that do not fall apart under edge cases.&lt;/p&gt;

&lt;p&gt;This post is a condensed rewrite of PracHub's &lt;a href="https://prachub.com/interview-prep/uber-data-scientist-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Uber Data Scientist interview prep guide&lt;/a&gt;, focused on the themes that come up in technical screens and onsite rounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Uber is really testing
&lt;/h2&gt;

&lt;p&gt;Across SQL, product analytics, experimentation, and stats, interviewers want to see whether you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define the metric correctly&lt;/li&gt;
&lt;li&gt;choose the right unit of analysis&lt;/li&gt;
&lt;li&gt;avoid leakage and bad denominators&lt;/li&gt;
&lt;li&gt;reason about interference in a two-sided marketplace&lt;/li&gt;
&lt;li&gt;separate model quality from business impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one matters a lot. Lower prediction error does not automatically mean a better rider experience. A statistically significant A/B test result does not automatically mean "launch."&lt;/p&gt;

&lt;h2&gt;
  
  
  1) SQL: can you build defensible metrics from messy event data?
&lt;/h2&gt;

&lt;p&gt;Uber SQL questions often look simple at first. Then they turn into deduping events, picking the correct grain, and handling time windows without leaking future information.&lt;/p&gt;

&lt;p&gt;Topics that come up often:&lt;/p&gt;

&lt;h3&gt;
  
  
  Window functions you should be comfortable with
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Last or first event per entity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;ROW_NUMBER()&lt;/code&gt; with a deterministic sort:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_ts&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the standard pattern for "latest trip per rider" or "first exposure per user."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rolling metrics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For time-series summaries, know how to write rolling averages by partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
  &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="k"&gt;PRECEDING&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;CURRENT&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Top-N logic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You should know when to use &lt;code&gt;RANK&lt;/code&gt;, &lt;code&gt;DENSE_RANK&lt;/code&gt;, and &lt;code&gt;ROW_NUMBER&lt;/code&gt;, and be able to explain tie behavior clearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cohort conversion and CTR&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A common failure mode is inflated CTR after joining impressions to clicks. If one impression maps to multiple clicks, &lt;code&gt;COUNT(*)&lt;/code&gt; breaks the metric. You need to define the denominator once, dedupe at the right grain, and use explicit attribution windows like &lt;code&gt;click_ts &amp;lt;= impression_ts + interval '48 hours'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date spine joins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These matter for rolling averages and anomaly detection. Generate all dates first, then left join events, and fill missing counts with zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timezone-aware aggregation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you analyze market-level data, local time matters. San Francisco metrics in January should not be cut on raw UTC day boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common SQL mistakes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;counting rows after a one-to-many join&lt;/li&gt;
&lt;li&gt;using future rows in a rolling metric&lt;/li&gt;
&lt;li&gt;treating &lt;code&gt;RANK&lt;/code&gt; and &lt;code&gt;ROW_NUMBER&lt;/code&gt; as interchangeable&lt;/li&gt;
&lt;li&gt;skipping timezone conversion before &lt;code&gt;DATE_TRUNC&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want realistic drills for this style of question, PracHub has a set of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;data science interview practice questions&lt;/a&gt; that match the patterns above.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) ETA questions: accuracy is only part of the problem
&lt;/h2&gt;

&lt;p&gt;ETA is one of the clearest examples of how Uber expects product sense and statistical judgment to work together.&lt;/p&gt;

&lt;p&gt;An interviewer is not looking for "we reduced MAE, so the model is better." They want you to think through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the ETA label is&lt;/li&gt;
&lt;li&gt;how to evaluate prediction quality&lt;/li&gt;
&lt;li&gt;whether the prediction is calibrated&lt;/li&gt;
&lt;li&gt;how uncertainty should be measured&lt;/li&gt;
&lt;li&gt;what user behavior changes after ETA changes&lt;/li&gt;
&lt;li&gt;how interference breaks naive A/B testing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Start with label definition
&lt;/h3&gt;

&lt;p&gt;You need to ask what ETA means in the question.&lt;/p&gt;

&lt;p&gt;Is it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request-to-pickup time?&lt;/li&gt;
&lt;li&gt;pickup-to-dropoff time?&lt;/li&gt;
&lt;li&gt;total trip duration?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The target has to match the user-facing promise. Cancellations, reassignment, batching, and no-shows all affect the label definition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Know the evaluation metrics and what they miss
&lt;/h3&gt;

&lt;p&gt;Uber cares about more than one error metric:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MAE&lt;/strong&gt; is easy to interpret in minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RMSE&lt;/strong&gt; penalizes large misses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;median absolute error&lt;/strong&gt; is more stable with outliers like airports or events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bias&lt;/strong&gt; tells you whether the model is systematically optimistic or pessimistic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should also say you would segment results by city, time of day, weather, airport, and trip type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calibration matters
&lt;/h3&gt;

&lt;p&gt;If the app says 5 minutes and riders usually wait 7, the model is underestimating. That can increase conversion in the short run and hurt trust later.&lt;/p&gt;

&lt;p&gt;Reliability curves by ETA bucket are often more useful than one aggregate accuracy score.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uncertainty matters too
&lt;/h3&gt;

&lt;p&gt;For dispatch and UX decisions, intervals can matter as much as point estimates. A 90% prediction interval should contain the actual arrival time about 90% of the time. Coverage and interval width are both relevant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect ETA to business outcomes
&lt;/h3&gt;

&lt;p&gt;A good answer separates model metrics from business metrics.&lt;/p&gt;

&lt;p&gt;Examples of business outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request conversion&lt;/li&gt;
&lt;li&gt;cancellation rate&lt;/li&gt;
&lt;li&gt;completed trips&lt;/li&gt;
&lt;li&gt;pickup delay&lt;/li&gt;
&lt;li&gt;rider satisfaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Guardrails might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;driver idle time&lt;/li&gt;
&lt;li&gt;acceptance rate&lt;/li&gt;
&lt;li&gt;surge exposure&lt;/li&gt;
&lt;li&gt;support contacts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3) Uber experiments are often not standard A/B tests
&lt;/h2&gt;

&lt;p&gt;This is where many candidates get too generic.&lt;/p&gt;

&lt;p&gt;For consumer apps, user-level randomization is often fine. At Uber, treatment can affect shared supply. One rider's treatment can change another rider's outcome. That means &lt;code&gt;SUTVA&lt;/code&gt; may fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  When interference matters
&lt;/h3&gt;

&lt;p&gt;If treatment changes dispatch, pricing, ETA display, or demand, untreated users may still be affected.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a rider-facing ETA change shifts demand in a neighborhood&lt;/li&gt;
&lt;li&gt;a driver incentive changes driver supply for everyone nearby&lt;/li&gt;
&lt;li&gt;a marketplace ranking change affects matching outcomes across groups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ignore that, your experiment readout may look precise and still be wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Know when to propose switchback experiments
&lt;/h3&gt;

&lt;p&gt;For marketplace changes, Uber often needs geo-time randomization instead of user-level assignment.&lt;/p&gt;

&lt;p&gt;A strong answer for an ETA or dispatch experiment usually includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the estimand&lt;/li&gt;
&lt;li&gt;the randomization design&lt;/li&gt;
&lt;li&gt;primary metrics and guardrails&lt;/li&gt;
&lt;li&gt;the inference plan&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A reasonable design is a switchback experiment with city-zone-hour cells. You randomize treatment by market and time block, then analyze results with cluster-robust standard errors or a regression with time and geography fixed effects.&lt;/p&gt;

&lt;p&gt;Do not use naive row-level standard errors if the design is clustered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Power is different under clustering
&lt;/h3&gt;

&lt;p&gt;For clustered experiments, you need to account for design effect:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DEFF = 1 + (m - 1)rho&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;where &lt;code&gt;m&lt;/code&gt; is cluster size and &lt;code&gt;rho&lt;/code&gt; is intra-cluster correlation.&lt;/p&gt;

&lt;p&gt;That means more events inside the same cluster do not help as much as people expect. More independent clusters or time blocks usually matter more.&lt;/p&gt;

&lt;h2&gt;
  
  
  4) A/B testing answers need a decision framework
&lt;/h2&gt;

&lt;p&gt;A lot of candidates list metrics and stop there. Uber wants a launch recommendation, not a metrics dump.&lt;/p&gt;

&lt;p&gt;A solid structure is:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Define the objective
&lt;/h3&gt;

&lt;p&gt;Example: Does a promo targeting change increase completed trips or gross bookings at an acceptable promo cost and contribution margin?&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pick the right randomization unit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;rider_id&lt;/code&gt; for rider promos&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;driver_id&lt;/code&gt; for driver incentives&lt;/li&gt;
&lt;li&gt;geo or switchback for marketplace changes with spillovers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Choose one primary metric
&lt;/h3&gt;

&lt;p&gt;Possible primary metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;completed trips per user&lt;/li&gt;
&lt;li&gt;conversion rate&lt;/li&gt;
&lt;li&gt;gross bookings&lt;/li&gt;
&lt;li&gt;variable contribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then add a short list of guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cancellation rate&lt;/li&gt;
&lt;li&gt;ETA&lt;/li&gt;
&lt;li&gt;surge rate&lt;/li&gt;
&lt;li&gt;driver utilization&lt;/li&gt;
&lt;li&gt;support contact rate&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Check validity before interpretation
&lt;/h3&gt;

&lt;p&gt;You should mention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sample ratio mismatch&lt;/li&gt;
&lt;li&gt;exposure correctness&lt;/li&gt;
&lt;li&gt;pre-treatment balance&lt;/li&gt;
&lt;li&gt;logging completeness&lt;/li&gt;
&lt;li&gt;novelty or day-of-week effects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Make the recommendation based on practical value
&lt;/h3&gt;

&lt;p&gt;Do not say "p &amp;lt; 0.05, ship it."&lt;/p&gt;

&lt;p&gt;A result can be statistically significant and still be a bad launch if contribution drops, promo spend gets out of control, or marketplace health gets worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final prep advice
&lt;/h2&gt;

&lt;p&gt;If you're studying for this interview, spend less time on abstract ML talk and more time on clean definitions, marketplace-aware experiment design, and SQL execution details. That is where many answers get weak.&lt;/p&gt;

&lt;p&gt;The full &lt;a href="https://prachub.com/interview-prep/uber-data-scientist-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Uber Data Scientist interview prep guide on PracHub&lt;/a&gt; goes deeper on ETA evaluation, A/B testing, SQL patterns, and practice prompts. If you want to pressure-test yourself, work through timed &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;practice questions here&lt;/a&gt; and say your answer out loud like you're already in the interview.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>uber</category>
      <category>datascientist</category>
    </item>
    <item>
      <title>CTR And Engagement Metrics Explained — Tech Interview Concept (2026)</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Wed, 27 May 2026 14:24:49 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/ctr-and-engagement-metrics-explained-tech-interview-concept-2026-9ad</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/ctr-and-engagement-metrics-explained-tech-interview-concept-2026-9ad</guid>
      <description>&lt;p&gt;CTR questions look simple in interviews until you realize the interviewer is not asking for a formula. They want to know whether you can define exposure correctly, separate shallow clicks from useful engagement, and decide what to do when metrics move in opposite directions.&lt;/p&gt;

&lt;p&gt;This post is adapted from PracHub's original breakdown of &lt;a href="https://prachub.com/concepts/ctr-and-engagement-metrics?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;CTR and engagement metrics&lt;/a&gt;, but rewritten as a standalone guide for data science and product analytics interviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  What interviewers are really testing
&lt;/h2&gt;

&lt;p&gt;On ranking-heavy products like a home feed, carousel, Shopping surface, fresh content module, or video feed, small product changes can move several metrics at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTR goes up&lt;/li&gt;
&lt;li&gt;saves stay flat&lt;/li&gt;
&lt;li&gt;reports rise&lt;/li&gt;
&lt;li&gt;retention drops&lt;/li&gt;
&lt;li&gt;impressions per user jump because the UI exposes more content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your job is to reason through that mess without jumping to the wrong conclusion.&lt;/p&gt;

&lt;p&gt;Interviewers want to hear that you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define metrics precisely&lt;/li&gt;
&lt;li&gt;explain what changed causally, not just descriptively&lt;/li&gt;
&lt;li&gt;separate product impact from logging issues or mix shifts&lt;/li&gt;
&lt;li&gt;make a launch recommendation under uncertainty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your answer stays at "CTR is clicks divided by impressions," it is too shallow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the metric definition
&lt;/h2&gt;

&lt;p&gt;Yes, CTR is usually:&lt;/p&gt;

&lt;p&gt;$$CTR = \frac{\text{clicks}}{\text{impressions}}$$&lt;/p&gt;

&lt;p&gt;But the grain matters a lot.&lt;/p&gt;

&lt;p&gt;You might be talking about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;item-level CTR&lt;/li&gt;
&lt;li&gt;user-level average CTR&lt;/li&gt;
&lt;li&gt;session-level CTR&lt;/li&gt;
&lt;li&gt;surface-level CTR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are different metrics with different interpretations. In experiment analysis, user-level aggregation is often the safer choice because heavy users can otherwise dominate the estimate.&lt;/p&gt;

&lt;p&gt;That is one of the easiest points to miss in an interview. If the surface is personalized, the user is usually the right unit for inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  CTR alone is not the goal
&lt;/h2&gt;

&lt;p&gt;A strong answer treats CTR as an intermediate metric, not the business objective.&lt;/p&gt;

&lt;p&gt;For Pinterest-style surfaces, engagement quality can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pin clicks&lt;/li&gt;
&lt;li&gt;saves&lt;/li&gt;
&lt;li&gt;closeups&lt;/li&gt;
&lt;li&gt;outbound clicks&lt;/li&gt;
&lt;li&gt;follows&lt;/li&gt;
&lt;li&gt;board adds&lt;/li&gt;
&lt;li&gt;video starts&lt;/li&gt;
&lt;li&gt;video completes&lt;/li&gt;
&lt;li&gt;Shopping product clicks&lt;/li&gt;
&lt;li&gt;return visits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A +3% CTR result does not mean the launch is good. If saves are flat and hide or report rates get worse, you may have made the feed more clickbaity rather than more useful.&lt;/p&gt;

&lt;p&gt;That is why you need a metric hierarchy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build a metric hierarchy before you interpret anything
&lt;/h2&gt;

&lt;p&gt;A clean interview answer usually has three layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Primary success metric
&lt;/h3&gt;

&lt;p&gt;Pick the metric closest to user value for that surface.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;saves per user&lt;/li&gt;
&lt;li&gt;engaged sessions per user&lt;/li&gt;
&lt;li&gt;shopping-engaged sessions&lt;/li&gt;
&lt;li&gt;product detail clicks per user&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Diagnostic metrics
&lt;/h3&gt;

&lt;p&gt;These tell you where movement came from.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;impressions per user&lt;/li&gt;
&lt;li&gt;module visibility&lt;/li&gt;
&lt;li&gt;viewport rate&lt;/li&gt;
&lt;li&gt;click position&lt;/li&gt;
&lt;li&gt;downstream save rate&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Guardrails
&lt;/h3&gt;

&lt;p&gt;These stop you from shipping a bad tradeoff.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hides&lt;/li&gt;
&lt;li&gt;reports&lt;/li&gt;
&lt;li&gt;session exits&lt;/li&gt;
&lt;li&gt;latency perception&lt;/li&gt;
&lt;li&gt;creator or content diversity&lt;/li&gt;
&lt;li&gt;overall home feed engagement&lt;/li&gt;
&lt;li&gt;retention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you list ten metrics without saying which one decides the launch, your answer will sound scattered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exposure definition is where a lot of people fail
&lt;/h2&gt;

&lt;p&gt;In ranking systems, "impression" is often the hardest metric to define correctly.&lt;/p&gt;

&lt;p&gt;An impression should mean the user had a reasonable chance to see the item. It should not mean "the server ranked it."&lt;/p&gt;

&lt;p&gt;For a carousel, you should distinguish between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;module rendered&lt;/li&gt;
&lt;li&gt;module in viewport&lt;/li&gt;
&lt;li&gt;item impression&lt;/li&gt;
&lt;li&gt;click&lt;/li&gt;
&lt;li&gt;post-click engagement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction matters because a ranking or UI change can alter the denominator mechanically. If more items count as impressions because users scroll farther or because the module renders differently, CTR can fall even if the product got better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch for denominator effects
&lt;/h2&gt;

&lt;p&gt;This is one of the best points you can bring into an interview.&lt;/p&gt;

&lt;p&gt;Suppose a recommendation launch shows more content lower in the feed. You may get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more impressions per user&lt;/li&gt;
&lt;li&gt;more clicks per user&lt;/li&gt;
&lt;li&gt;more saves per user&lt;/li&gt;
&lt;li&gt;lower raw CTR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not contradictory. The denominator grew faster than the numerator.&lt;/p&gt;

&lt;p&gt;So when CTR drops, do not stop there. Look at both rates and volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTR&lt;/li&gt;
&lt;li&gt;clicks per user&lt;/li&gt;
&lt;li&gt;impressions per user&lt;/li&gt;
&lt;li&gt;saves per user&lt;/li&gt;
&lt;li&gt;save rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A strong candidate says, "I want to know whether user value fell, or whether the exposure mix changed."&lt;/p&gt;

&lt;h2&gt;
  
  
  How to talk about experiment design
&lt;/h2&gt;

&lt;p&gt;For personalized feeds, randomize at the user level.&lt;/p&gt;

&lt;p&gt;Why? Because recommendation exposure and engagement history are correlated across sessions. Item-level randomization can create an inconsistent user experience and can contaminate training signals.&lt;/p&gt;

&lt;p&gt;You should also mention experiment validity checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;randomization balance&lt;/li&gt;
&lt;li&gt;ramp timing&lt;/li&gt;
&lt;li&gt;sample ratio mismatch&lt;/li&gt;
&lt;li&gt;pre-period comparability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the assignment is broken, the metric read is not trustworthy.&lt;/p&gt;

&lt;p&gt;For surfaces with social or marketplace effects, spillovers can matter. In those cases, you may need to think beyond pure user-level analysis and discuss creator-level or geo-level effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Match the stats method to the metric
&lt;/h2&gt;

&lt;p&gt;Interviewers like this because it separates people who know product metrics from people who know how to analyze them.&lt;/p&gt;

&lt;p&gt;Binary click outcomes can use proportion-style tests. But engagement per user is often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heavy-tailed&lt;/li&gt;
&lt;li&gt;zero-inflated&lt;/li&gt;
&lt;li&gt;noisy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common approaches include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user-level means with robust standard errors&lt;/li&gt;
&lt;li&gt;bootstrap&lt;/li&gt;
&lt;li&gt;winsorization sensitivity checks&lt;/li&gt;
&lt;li&gt;delta method for ratios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important warning: do not treat clicks and impressions as independent row-level observations when you analyze ratio metrics. That usually gives false confidence.&lt;/p&gt;

&lt;p&gt;You should also mention power and minimum detectable effect. Tiny CTR lifts can be statistically significant on high-traffic surfaces and still be too small to matter for the business.&lt;/p&gt;

&lt;p&gt;The sample size framing from the source is:&lt;/p&gt;

&lt;p&gt;$$n \approx \frac{2\sigma^2(z_{1-\alpha/2}+z_{1-\beta})^2}{\Delta^2}$$&lt;/p&gt;

&lt;p&gt;where $\Delta$ is the minimum detectable effect.&lt;/p&gt;

&lt;p&gt;The right follow-up is, "What effect size would change a launch decision?"&lt;/p&gt;

&lt;h2&gt;
  
  
  CUPED is worth bringing up
&lt;/h2&gt;

&lt;p&gt;If the interviewer asks how to improve experiment sensitivity, CUPED is a solid answer.&lt;/p&gt;

&lt;p&gt;The adjustment is:&lt;/p&gt;

&lt;p&gt;$$Y_{adj}=Y-\theta(X-\bar{X}),\quad \theta=\frac{Cov(Y,X)}{Var(X)}$$&lt;/p&gt;

&lt;p&gt;Use it when pre-period behavior predicts post-period behavior and treatment cannot affect that pre-period covariate. It is especially useful for noisy user-level metrics like saves per user or Shopping clicks.&lt;/p&gt;

&lt;p&gt;You do not need a long derivation. Just show that you know when it helps.&lt;/p&gt;

&lt;h2&gt;
  
  
  A good framework for "CTR dropped after a recommendation launch"
&lt;/h2&gt;

&lt;p&gt;This is the kind of case question you might get directly.&lt;/p&gt;

&lt;p&gt;A solid structure has four parts:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Validate the metric path
&lt;/h3&gt;

&lt;p&gt;Ask how impressions and clicks are defined.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ranked impression vs viewport impression&lt;/li&gt;
&lt;li&gt;deduping rules&lt;/li&gt;
&lt;li&gt;whether the new system changed logging or counting logic&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Validate the experiment
&lt;/h3&gt;

&lt;p&gt;Ask whether it was an A/B test or full rollout.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;randomization balance&lt;/li&gt;
&lt;li&gt;sample ratio mismatch&lt;/li&gt;
&lt;li&gt;pre-period similarity&lt;/li&gt;
&lt;li&gt;ramp timing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Decompose the funnel
&lt;/h3&gt;

&lt;p&gt;Since:&lt;/p&gt;

&lt;p&gt;$$CTR = \frac{\text{clicks}}{\text{impressions}}$$&lt;/p&gt;

&lt;p&gt;ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did clicks fall?&lt;/li&gt;
&lt;li&gt;Did impressions rise?&lt;/li&gt;
&lt;li&gt;Did position mix shift toward lower slots?&lt;/li&gt;
&lt;li&gt;Did visibility or viewport rates change?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Segment for diagnosis
&lt;/h3&gt;

&lt;p&gt;After checking the overall result, cut by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new vs returning users&lt;/li&gt;
&lt;li&gt;heavy vs light users&lt;/li&gt;
&lt;li&gt;platform&lt;/li&gt;
&lt;li&gt;country&lt;/li&gt;
&lt;li&gt;content type&lt;/li&gt;
&lt;li&gt;fresh vs evergreen content&lt;/li&gt;
&lt;li&gt;video vs static&lt;/li&gt;
&lt;li&gt;shopping-intent users&lt;/li&gt;
&lt;li&gt;position or session intent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key judgment call is this: lower CTR may be acceptable if deeper value improves. If saves per user, long-clicks, Shopping conversions, or retention go up while low-quality clicks go down, "CTR dropped" is not enough reason to roll back.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shopping version of this question
&lt;/h2&gt;

&lt;p&gt;For a Shopping launch, CTR is even less likely to be the final metric that matters.&lt;/p&gt;

&lt;p&gt;Primary metrics may shift toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product detail clicks per user&lt;/li&gt;
&lt;li&gt;merchant outbound clicks&lt;/li&gt;
&lt;li&gt;add-to-cart proxies&lt;/li&gt;
&lt;li&gt;shopping-engaged sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And your guardrails still matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;overall home feed engagement&lt;/li&gt;
&lt;li&gt;user trust&lt;/li&gt;
&lt;li&gt;retention&lt;/li&gt;
&lt;li&gt;cannibalization of organic pin engagement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should also mention heterogeneity. Users with shopping intent may benefit, while casual browsers may see irrelevant commerce content. That changes how you think about targeting and interpretation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Treating CTR as the business goal
&lt;/h3&gt;

&lt;p&gt;It is usually a diagnostic or intermediate metric.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ignoring exposure changes
&lt;/h3&gt;

&lt;p&gt;A CTR drop can come from more low-intent impressions, different positions, or a new UI module.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listing metrics without a decision rule
&lt;/h3&gt;

&lt;p&gt;Say what is primary, what is a guardrail, and what tradeoff is acceptable.&lt;/p&gt;

&lt;p&gt;A much better interview line is:&lt;/p&gt;

&lt;p&gt;"Launch if saves per user or shopping-engaged sessions improve without meaningful degradation in retention, home feed engagement, or negative feedback."&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;If you get a CTR question in an interview, do not answer it like a spreadsheet exercise. Define exposure carefully. Use user-level reasoning. Check denominator effects. Separate shallow clicks from meaningful engagement. Then make a decision with guardrails, not with one ratio.&lt;/p&gt;

&lt;p&gt;If you want the original concept note, formulas, and interview framing, read the full PracHub post on &lt;a href="https://prachub.com/concepts/ctr-and-engagement-metrics?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;CTR and engagement metrics&lt;/a&gt;. If you want more practice in this style, PracHub also has a set of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;interview questions&lt;/a&gt; on metrics, experimentation, and product data science.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>analyticsexperimentation</category>
      <category>programming</category>
    </item>
    <item>
      <title>Meta Data Scientist Interview Cheatsheet 2026</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Sun, 24 May 2026 04:01:10 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/meta-data-scientist-interview-cheatsheet-2026-i2d</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/meta-data-scientist-interview-cheatsheet-2026-i2d</guid>
      <description>&lt;p&gt;If you're preparing for Meta data scientist interviews, one pattern shows up fast: the bar is not "can you compute a metric?" It is "can you define the right metric, design a clean experiment, and explain tradeoffs like an owner?"&lt;/p&gt;

&lt;p&gt;This article pulls together the most interview-relevant parts of PracHub's &lt;a href="https://prachub.com/interview-prep/meta-data-scientist-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Meta Data Scientist interview prep guide&lt;/a&gt;, with a focus on areas candidates often get pressed on: notification analytics, A/B testing, cluster randomization, and SQL event logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Meta interviewers are usually testing
&lt;/h2&gt;

&lt;p&gt;Across technical screens and onsites, the questions often sound broad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"How would you evaluate similar-listing notifications?"&lt;/li&gt;
&lt;li&gt;"Design an experiment for a new ads ranking model"&lt;/li&gt;
&lt;li&gt;"Write SQL to compute engagement or call metrics"&lt;/li&gt;
&lt;li&gt;"What would you do if there is interference between users?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying skill is the same. You need to move from raw events or product ideas to a decision-ready analysis. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;defining eligibility&lt;/li&gt;
&lt;li&gt;choosing a randomization unit&lt;/li&gt;
&lt;li&gt;picking a primary metric&lt;/li&gt;
&lt;li&gt;adding guardrails&lt;/li&gt;
&lt;li&gt;checking whether the observed impact is actually incremental&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your answer stays at the dashboard level, it will usually feel weak.&lt;/p&gt;




&lt;h2&gt;
  
  
  Notification analytics is a causal question, not a CTR question
&lt;/h2&gt;

&lt;p&gt;A common interview prompt is some variation of push notifications or similar-listing alerts. The mistake many candidates make is optimizing for click-through rate.&lt;/p&gt;

&lt;p&gt;That is too shallow.&lt;/p&gt;

&lt;p&gt;For notification products, Meta cares about whether the notification creates net value or just interrupts people enough to get clicks. A strong answer breaks the system into a funnel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;eligibility&lt;/li&gt;
&lt;li&gt;send&lt;/li&gt;
&lt;li&gt;delivery&lt;/li&gt;
&lt;li&gt;impression or open&lt;/li&gt;
&lt;li&gt;click&lt;/li&gt;
&lt;li&gt;landing-page engagement&lt;/li&gt;
&lt;li&gt;downstream action&lt;/li&gt;
&lt;li&gt;longer-term retention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a marketplace notification, downstream actions matter more than raw clicks. Examples from the source include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;listing_view&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;save&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;seller_message&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;offer_sent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;purchase_intent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;transaction_proxy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the product goal is better buyer discovery, then a better primary metric than &lt;code&gt;notification_click_rate&lt;/code&gt; might be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;incremental &lt;code&gt;qualified_listing_views_per_user&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;buyer_seller_message_threads_per_eligible_user&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That framing shows you understand the product mechanism.&lt;/p&gt;

&lt;h3&gt;
  
  
  Guardrails matter more for notifications than people think
&lt;/h3&gt;

&lt;p&gt;Notifications impose an attention cost. Your answer should include guardrails such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;push_opt_out_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification_disable_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;app_uninstall_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hide_report_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;negative_feedback_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_depth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;7d_retention&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;total notification volume per user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ignore fatigue, your experiment design looks incomplete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Be precise about eligibility and exposure
&lt;/h3&gt;

&lt;p&gt;Another common failure mode is saying, "compare users who got notifications with users who didn't."&lt;/p&gt;

&lt;p&gt;That comparison is biased. Users who receive notifications are often already more active, have permissions enabled, or have more relevant inventory available.&lt;/p&gt;

&lt;p&gt;A better answer starts with a fixed eligible population, for example users who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;viewed or saved a marketplace item in the last 7 days&lt;/li&gt;
&lt;li&gt;have push permissions enabled&lt;/li&gt;
&lt;li&gt;have at least one similar listing available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then analyze intent-to-treat on randomized eligible users. You can inspect treatment-on-treated later, but only with the right causal caveats.&lt;/p&gt;

&lt;h3&gt;
  
  
  Watch for cannibalization and spillovers
&lt;/h3&gt;

&lt;p&gt;A similar-listing notification can shift behavior from search, organic feed, saved items, or other notification channels rather than create new demand. So you should measure total marketplace engagement, not only attributed notification clicks.&lt;/p&gt;

&lt;p&gt;If the product has social, household, or marketplace spillovers, say that directly. That is often when an interviewer pushes into cluster randomization.&lt;/p&gt;




&lt;h2&gt;
  
  
  A/B testing answers need an estimand, not just a p-value
&lt;/h2&gt;

&lt;p&gt;Meta interviewers want to hear that you can design an experiment before data exists, not just analyze one afterward.&lt;/p&gt;

&lt;p&gt;Start with the decision and the causal quantity. In plain terms: what launch decision does this test inform, and for whom?&lt;/p&gt;

&lt;p&gt;For many interview prompts, your structure can be:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define the product change and eligible population
&lt;/li&gt;
&lt;li&gt;Choose a randomization unit
&lt;/li&gt;
&lt;li&gt;Name the primary metric
&lt;/li&gt;
&lt;li&gt;Add guardrails
&lt;/li&gt;
&lt;li&gt;Discuss power, variance, and diagnostics
&lt;/li&gt;
&lt;li&gt;Explain how you'd interpret null or mixed results&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Choose the randomization unit based on interference risk
&lt;/h3&gt;

&lt;p&gt;User-level randomization is often fine for isolated product changes. It is not automatically correct.&lt;/p&gt;

&lt;p&gt;If one user's treatment can affect another user's outcome, then SUTVA may fail. In Meta-style products, that comes up in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;social feeds&lt;/li&gt;
&lt;li&gt;messaging&lt;/li&gt;
&lt;li&gt;ads auctions&lt;/li&gt;
&lt;li&gt;marketplaces&lt;/li&gt;
&lt;li&gt;creator ecosystems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those cases, you may need cluster, geo, advertiser, page, or marketplace-level randomization.&lt;/p&gt;

&lt;p&gt;If you say, "I'd use user-level randomization if interference is low, otherwise I'd consider cluster or geo designs," that is already much stronger than forcing every problem into a 50/50 user RCT.&lt;/p&gt;

&lt;h3&gt;
  
  
  Power should be discussed at the right level
&lt;/h3&gt;

&lt;p&gt;For repeated notifications or clustered experiments, observations are correlated. You should talk about power at the user or cluster level, not at the event level.&lt;/p&gt;

&lt;p&gt;The source also calls out the design effect for clustered experiments:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DE = 1 + (m - 1)rho&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;where &lt;code&gt;m&lt;/code&gt; is average cluster size and &lt;code&gt;rho&lt;/code&gt; is intracluster correlation.&lt;/p&gt;

&lt;p&gt;That matters because a huge row count can still translate into a much smaller effective sample size.&lt;/p&gt;

&lt;h3&gt;
  
  
  CUPED is worth mentioning if the prompt invites depth
&lt;/h3&gt;

&lt;p&gt;For noisy product metrics, pre-experiment covariates can reduce variance. The source mentions CUPED, which adjusts outcomes using pre-period behavior. You do not need to derive it in every answer, but mentioning it in a Meta interview often signals practical experiment experience.&lt;/p&gt;

&lt;p&gt;Use it when the pre-period metric strongly predicts the post-period metric, such as engagement, spend, or retention.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to answer a "similar-listing notifications" question
&lt;/h2&gt;

&lt;p&gt;A solid answer could sound like this:&lt;/p&gt;

&lt;p&gt;First, clarify the product goal. Are you trying to increase discovery, transactions, or re-engagement among users with shopping intent?&lt;/p&gt;

&lt;p&gt;Next, define the eligible population: users who recently viewed or saved an item, have push permissions on, and have relevant similar inventory available.&lt;/p&gt;

&lt;p&gt;Then propose user-level randomization if interference is limited. Treatment users receive similar-listing pushes, control users stay on the current notification policy.&lt;/p&gt;

&lt;p&gt;Pick a primary metric tied to downstream value, like incremental &lt;code&gt;qualified_listing_views_per_eligible_user&lt;/code&gt; or &lt;code&gt;buyer_seller_message_threads_per_user&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Use secondary metrics like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;notification_open_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;save_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;return_sessions&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;push_opt_out_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;notification_settings_disable_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hide_report_rate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;notifications received per user&lt;/li&gt;
&lt;li&gt;&lt;code&gt;7d_retention&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then say you'd analyze ITT first, check whether gains are incremental versus cannibalized from existing surfaces, and look at heterogeneous effects by intent, notification sensitivity, and inventory density.&lt;/p&gt;

&lt;p&gt;That answer is much closer to what interviewers want than "I'd compare CTR between treatment and control."&lt;/p&gt;




&lt;h2&gt;
  
  
  SQL event log questions are mostly about grain and joins
&lt;/h2&gt;

&lt;p&gt;The SQL side of the interview is less about syntax tricks and more about getting metric definitions right.&lt;/p&gt;

&lt;p&gt;The source's advice is simple and useful:&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Decide the grain first
&lt;/h3&gt;

&lt;p&gt;Know what one row means before you write code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user-day&lt;/li&gt;
&lt;li&gt;call-day&lt;/li&gt;
&lt;li&gt;impression-day&lt;/li&gt;
&lt;li&gt;country-day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of mistakes come from skipping this step.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Be careful with time windows
&lt;/h3&gt;

&lt;p&gt;Use bounded windows like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;event_ts &amp;gt;= start&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;event_ts &amp;lt; end&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That avoids double counting midnight events.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Aggregate before joining when needed
&lt;/h3&gt;

&lt;p&gt;Joining raw event tables too early can multiply rows and inflate clicks, responses, revenue, or duration.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Protect ratio calculations
&lt;/h3&gt;

&lt;p&gt;Use safe denominators and be explicit about what happens when the denominator is zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  5) Clarify deduplication rules
&lt;/h3&gt;

&lt;p&gt;If the metric requires one valid event per entity or one response per user, say how you would dedupe, often with &lt;code&gt;ROW_NUMBER()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;These are basic ideas, but they come up constantly in product analytics interviews.&lt;/p&gt;




&lt;h2&gt;
  
  
  What candidates most often miss
&lt;/h2&gt;

&lt;p&gt;From the source, the recurring weak spots are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;optimizing for &lt;code&gt;CTR&lt;/code&gt; alone&lt;/li&gt;
&lt;li&gt;being vague about eligibility or exposure&lt;/li&gt;
&lt;li&gt;ignoring interference and repeated treatment&lt;/li&gt;
&lt;li&gt;assuming every experiment should be user-level 50/50&lt;/li&gt;
&lt;li&gt;treating a null result as proof of no effect&lt;/li&gt;
&lt;li&gt;skipping diagnostics like sample ratio mismatch, logging sanity, or pre-period balance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you avoid those, your answers already sound more senior.&lt;/p&gt;




&lt;h2&gt;
  
  
  A better way to use a cheatsheet
&lt;/h2&gt;

&lt;p&gt;Don't memorize lines. Practice turning these patterns into spoken answers.&lt;/p&gt;

&lt;p&gt;Take a prompt like notifications, ads ranking, or call metrics, and force yourself to answer in this order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;goal&lt;/li&gt;
&lt;li&gt;population&lt;/li&gt;
&lt;li&gt;unit of randomization&lt;/li&gt;
&lt;li&gt;primary metric&lt;/li&gt;
&lt;li&gt;guardrails&lt;/li&gt;
&lt;li&gt;power and inference risks&lt;/li&gt;
&lt;li&gt;interpretation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want more realistic drills, PracHub also has &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;interview practice questions here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And if you're preparing specifically for Meta, the full &lt;a href="https://prachub.com/interview-prep/meta-data-scientist-interview-prep?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Meta Data Scientist interview prep guide on PracHub&lt;/a&gt; is the better reference because it keeps these topics in one place and ties them to actual interview-style prompts.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>meta</category>
      <category>datascientist</category>
    </item>
    <item>
      <title>Top 50 SQL Interview Questions with Answers (2026)</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:46:04 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/top-50-sql-interview-questions-with-answers-2026-27h7</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/top-50-sql-interview-questions-with-answers-2026-27h7</guid>
      <description>&lt;p&gt;SQL interviews are predictable in one useful way, the same patterns show up again and again.&lt;/p&gt;

&lt;p&gt;PracHub reviewed 649 SQL interview questions and pulled out the topics that come up most often. The original list, &lt;a href="https://prachub.com/resources/top-50-sql-interview-questions-with-answers-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;"Top 50 SQL Interview Questions with Answers (2026)"&lt;/a&gt;, is a solid map of what companies actually ask, not a textbook walk through SQL syntax.&lt;/p&gt;

&lt;p&gt;If you are preparing for interviews, this is where to focus.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with joins and window functions
&lt;/h2&gt;

&lt;p&gt;If your prep time is limited, spend it on joins first, then window functions.&lt;/p&gt;

&lt;p&gt;Those two areas show up in almost every SQL interview because they show how you think. Can you combine datasets cleanly? Can you answer analytical questions without writing five nested queries? Can you handle real business logic instead of toy examples?&lt;/p&gt;

&lt;p&gt;If a LEFT JOIN still takes you a minute to think through, stop and drill it until it is automatic.&lt;/p&gt;

&lt;h2&gt;
  
  
  1) Joins: table stakes
&lt;/h2&gt;

&lt;p&gt;These are the questions that should feel routine.&lt;/p&gt;

&lt;p&gt;Typical join questions include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find all customers who have never placed an order, usually with a LEFT JOIN and a NULL check&lt;/li&gt;
&lt;li&gt;Find the second highest salary in each department&lt;/li&gt;
&lt;li&gt;Join users and orders to calculate total spend per user&lt;/li&gt;
&lt;li&gt;Find employees whose salary is above their department average&lt;/li&gt;
&lt;li&gt;Use a self-join to find pairs of employees in the same department&lt;/li&gt;
&lt;li&gt;Find customers who placed orders in both January and February&lt;/li&gt;
&lt;li&gt;Show each product and its most recent order date&lt;/li&gt;
&lt;li&gt;LEFT JOIN three tables such as users, orders, and products&lt;/li&gt;
&lt;li&gt;Find users who signed up but never activated&lt;/li&gt;
&lt;li&gt;Join on a date range, such as orders placed within 7 days of signup&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why interviewers like these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They test whether you understand join types&lt;/li&gt;
&lt;li&gt;They expose weak handling of NULLs&lt;/li&gt;
&lt;li&gt;They show whether you can translate business rules into SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of candidates know INNER JOIN and freeze when the problem needs anti-joins, self-joins, or date conditions. That gap matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) Window functions: where difficulty jumps
&lt;/h2&gt;

&lt;p&gt;Window functions are where interviews often separate junior and senior candidates.&lt;/p&gt;

&lt;p&gt;You can get pretty far with GROUP BY, but many interview questions need row-level context and aggregate context at the same time. That is what window functions are for.&lt;/p&gt;

&lt;p&gt;Common examples:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Running total of sales by date
&lt;/li&gt;
&lt;li&gt;Top 3 products by revenue in each category with RANK or ROW_NUMBER
&lt;/li&gt;
&lt;li&gt;Month-over-month revenue growth with LAG
&lt;/li&gt;
&lt;li&gt;Moving average of daily active users over 7 days
&lt;/li&gt;
&lt;li&gt;Rank employees by salary within department
&lt;/li&gt;
&lt;li&gt;Difference between each row and the previous row
&lt;/li&gt;
&lt;li&gt;Cumulative percentage of total sales
&lt;/li&gt;
&lt;li&gt;First and last order for each customer with FIRST_VALUE or LAST_VALUE
&lt;/li&gt;
&lt;li&gt;Sessionization, grouping events within 30 minutes of each other
&lt;/li&gt;
&lt;li&gt;Retention, such as percentage of users active 7 days after signup&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These questions test whether you understand partitions, ordering, and window frames. They also test whether you know when a window function is better than a subquery.&lt;/p&gt;

&lt;p&gt;If you want one strong signal for interview readiness, it is this: can you write a correct LAG, ROW_NUMBER, or running total query without trial and error?&lt;/p&gt;

&lt;h2&gt;
  
  
  3) CTEs and subqueries: can you break a hard problem into steps?
&lt;/h2&gt;

&lt;p&gt;A lot of SQL interview questions are not hard because of syntax. They are hard because the logic has multiple stages.&lt;/p&gt;

&lt;p&gt;That is where CTEs help. They let you structure a query in chunks that another person can actually read.&lt;/p&gt;

&lt;p&gt;Questions in this group include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rewrite a nested subquery as a CTE
&lt;/li&gt;
&lt;li&gt;Build an employee hierarchy with a recursive CTE
&lt;/li&gt;
&lt;li&gt;Find the longest streak of consecutive login days per user
&lt;/li&gt;
&lt;li&gt;Calculate a funnel: signup to activation to first purchase to repeat purchase
&lt;/li&gt;
&lt;li&gt;Find duplicates and keep only the most recent row
&lt;/li&gt;
&lt;li&gt;Build a cohort table by signup month
&lt;/li&gt;
&lt;li&gt;Chain multiple CTEs to calculate a metric step by step
&lt;/li&gt;
&lt;li&gt;Find users whose spending increased every month for 3 straight months
&lt;/li&gt;
&lt;li&gt;Use a correlated subquery to find orders above the average for their product category
&lt;/li&gt;
&lt;li&gt;Pivot rows into columns with a CTE, without using PIVOT&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This category matters because interviewers are watching how you organize a solution.&lt;/p&gt;

&lt;p&gt;Messy SQL is often a sign of messy thinking. A clean chain of CTEs tells the interviewer that you can take a vague analytics question and turn it into a clear sequence of steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  4) Aggregation: basic, but easy to get wrong
&lt;/h2&gt;

&lt;p&gt;Aggregation questions look simple, then punish sloppy thinking.&lt;/p&gt;

&lt;p&gt;Most people can write &lt;code&gt;GROUP BY customer_id&lt;/code&gt;. The mistakes happen around edge cases, filtering, distinct counts, and post-aggregation conditions.&lt;/p&gt;

&lt;p&gt;Common prompts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Top 5 customers by total order value
&lt;/li&gt;
&lt;li&gt;Count unique products ordered per month
&lt;/li&gt;
&lt;li&gt;Average order value excluding outliers above the 99th percentile
&lt;/li&gt;
&lt;li&gt;Months where revenue exceeded 1 million
&lt;/li&gt;
&lt;li&gt;Group by category, region, and month
&lt;/li&gt;
&lt;li&gt;Departments with more than 10 employees and average salary above 100k using HAVING
&lt;/li&gt;
&lt;li&gt;Distinct users who performed at least 3 actions in one day
&lt;/li&gt;
&lt;li&gt;Find the mode of a column
&lt;/li&gt;
&lt;li&gt;Conditional aggregation with &lt;code&gt;SUM(CASE WHEN ...)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Median salary in databases that do not support &lt;code&gt;PERCENTILE_CONT&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is where candidates often mix up &lt;code&gt;WHERE&lt;/code&gt; and &lt;code&gt;HAVING&lt;/code&gt;, forget &lt;code&gt;COUNT(DISTINCT ...)&lt;/code&gt;, or write queries that work only for the happy path.&lt;/p&gt;

&lt;p&gt;If your SQL tends to break on NULLs, ties, or duplicate rows, aggregation questions will expose it fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  5) Data manipulation and optimization: more common in some roles, still fair game
&lt;/h2&gt;

&lt;p&gt;These show up more in data engineering interviews, but data scientists and analysts see them too.&lt;/p&gt;

&lt;p&gt;Topics include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;UPDATE a table using values from another table
&lt;/li&gt;
&lt;li&gt;Delete duplicate rows while keeping one copy
&lt;/li&gt;
&lt;li&gt;Insert transformed rows from one table into another
&lt;/li&gt;
&lt;li&gt;Write a MERGE or UPSERT
&lt;/li&gt;
&lt;li&gt;Explain DELETE vs TRUNCATE vs DROP
&lt;/li&gt;
&lt;li&gt;Add an index and explain when it helps or hurts
&lt;/li&gt;
&lt;li&gt;Rewrite a slow query to avoid a full table scan
&lt;/li&gt;
&lt;li&gt;Explain what to look for in a query execution plan
&lt;/li&gt;
&lt;li&gt;Partition a large table by date and explain the tradeoff
&lt;/li&gt;
&lt;li&gt;Handle NULL values correctly in comparisons and aggregations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This section matters because interviews are not always pure query-writing exercises. Sometimes you need to explain behavior, tradeoffs, or performance.&lt;/p&gt;

&lt;p&gt;A candidate who can write SQL and talk through why a query is slow usually comes across much stronger than someone who can only produce syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use this list well
&lt;/h2&gt;

&lt;p&gt;Do not treat these 50 prompts like trivia cards.&lt;/p&gt;

&lt;p&gt;Use them as a prioritization tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with joins and window functions&lt;/li&gt;
&lt;li&gt;Practice writing answers from scratch, without autocomplete&lt;/li&gt;
&lt;li&gt;Focus on correctness first, then readability&lt;/li&gt;
&lt;li&gt;For each question, know the common failure mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A few examples of failure modes worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returning duplicate rows after a join&lt;/li&gt;
&lt;li&gt;Using INNER JOIN where a LEFT JOIN is needed&lt;/li&gt;
&lt;li&gt;Filtering aggregated results in &lt;code&gt;WHERE&lt;/code&gt; instead of &lt;code&gt;HAVING&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Confusing &lt;code&gt;RANK()&lt;/code&gt; and &lt;code&gt;ROW_NUMBER()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Mishandling NULL comparisons&lt;/li&gt;
&lt;li&gt;Solving a window function question with a slow, tangled subquery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters. Interviewers usually care about your approach, not just whether the final query runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practice on real interview-style questions
&lt;/h2&gt;

&lt;p&gt;If you want more than a checklist, PracHub has a broader set of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;SQL interview practice questions&lt;/a&gt; with an in-browser SQL editor. The source article says the platform includes 649 SQL interview questions and lets you filter by difficulty and company.&lt;/p&gt;

&lt;p&gt;That is useful because SQL prep gets better when you move from reading solutions to actually writing them under mild pressure.&lt;/p&gt;

&lt;p&gt;And if you want the full categorized list in one place, go back to the original PracHub post: &lt;a href="https://prachub.com/resources/top-50-sql-interview-questions-with-answers-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;"Top 50 SQL Interview Questions with Answers (2026)"&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The main takeaway is simple. SQL interviews are less random than they look. If you get strong at joins, window functions, CTEs, aggregation, and basic optimization, you are covering most of what interviewers keep asking.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>sql</category>
      <category>interviewquestions</category>
    </item>
    <item>
      <title>System Design 101</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:44:03 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/system-design-101-478n</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/system-design-101-478n</guid>
      <description>&lt;p&gt;System design is one of those skills people try to speedrun, then realize that it just does not work that way.&lt;/p&gt;

&lt;p&gt;This article is adapted from a &lt;a href="https://prachub.com/resources/system-design-101?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;PracHub post on System Design 101&lt;/a&gt;, and the point is simple: if you want to get good at system design, real work matters more than polished tutorials.&lt;/p&gt;

&lt;p&gt;A lot of interview prep material makes system design look like a set of reusable templates. Some patterns do repeat, but strong interview performance usually comes from having seen real systems, real constraints, and real tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real system design experience beats tutorial knowledge
&lt;/h2&gt;

&lt;p&gt;The fastest way to build system design judgment is through work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building systems yourself&lt;/li&gt;
&lt;li&gt;reading designs from other teams&lt;/li&gt;
&lt;li&gt;seeing what failed in production&lt;/li&gt;
&lt;li&gt;understanding why one approach beat another&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is very different from memorizing a "design Twitter" or "design Uber" walkthrough.&lt;/p&gt;

&lt;p&gt;The source article makes a good point here. The author had led several designs that later showed up as classic interview questions. The value was not that they had seen the question before. It was that they had already gone through the parts most prep content skips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;implementation details&lt;/li&gt;
&lt;li&gt;tradeoffs between candidate solutions&lt;/li&gt;
&lt;li&gt;hardware assumptions&lt;/li&gt;
&lt;li&gt;load test results&lt;/li&gt;
&lt;li&gt;production pitfalls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why experienced engineers often sound more convincing in system design interviews. They are not reciting. They are talking about work they have done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Breadth vs depth depends on your level
&lt;/h2&gt;

&lt;p&gt;One useful part of the original post is the distinction between mid-level and senior interviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you are mid-level
&lt;/h3&gt;

&lt;p&gt;System design interviews usually test breadth more than depth.&lt;/p&gt;

&lt;p&gt;You can pass without knowing every technology in detail. You do need to propose a reasonable solution, explain your choices, and avoid obvious mistakes. Interviewers are usually looking for sane architecture, good data flow, and awareness of tradeoffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you are senior or above
&lt;/h3&gt;

&lt;p&gt;Breadth alone is not enough.&lt;/p&gt;

&lt;p&gt;You need depth too. You should be able to support decisions with experience, data, and a clear explanation of failure modes. If there is a gap in an area that matters to the problem, it can hurt a lot more at senior level than it would at mid-level.&lt;/p&gt;

&lt;p&gt;That also changes how you should grow your career.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to build system design skill through your job
&lt;/h2&gt;

&lt;p&gt;The advice here is practical.&lt;/p&gt;

&lt;p&gt;Early in your career, moving across teams or projects can help you build breadth. You see different architectures, constraints, and patterns. Later, staying longer in a domain helps you build depth. That is where you start to understand the details that separate an okay design from one that holds up under load.&lt;/p&gt;

&lt;p&gt;Over time, a lot of concepts connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data modeling affects scaling choices&lt;/li&gt;
&lt;li&gt;workload shape affects storage design&lt;/li&gt;
&lt;li&gt;consistency requirements affect architecture&lt;/li&gt;
&lt;li&gt;cost and capacity affect almost every decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your current role gives you none of that, it is fair to ask whether it is the right place for your growth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to study first
&lt;/h2&gt;

&lt;p&gt;The source recommends a small set of resources and is honest about their limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Designing Data-Intensive Applications
&lt;/h3&gt;

&lt;p&gt;DDIA is the foundation.&lt;/p&gt;

&lt;p&gt;People often call it the bible of system design, but a better way to put it is that it is a starter book for distributed data systems. That is still very valuable. Most system design interviews are really about data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what data exists&lt;/li&gt;
&lt;li&gt;how much of it there is&lt;/li&gt;
&lt;li&gt;how it is accessed&lt;/li&gt;
&lt;li&gt;how it is stored&lt;/li&gt;
&lt;li&gt;what integrity guarantees matter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DDIA helps you build that mental model.&lt;/p&gt;

&lt;p&gt;It will not hand you interview answers. It is weaker on batch and stream processing, so you may need other material if you want more depth there.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. System Design Primer
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/donnemartin/system-design-primer" rel="noopener noreferrer"&gt;System Design Primer&lt;/a&gt; is useful for beginners.&lt;/p&gt;

&lt;p&gt;The warning from the source is fair: because it is crowd-sourced, some content has errors. Read it critically. Use it to learn concepts, not as something to memorize word for word.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Classic distributed systems papers
&lt;/h3&gt;

&lt;p&gt;The source specifically calls out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GFS&lt;/li&gt;
&lt;li&gt;MapReduce&lt;/li&gt;
&lt;li&gt;Bigtable&lt;/li&gt;
&lt;li&gt;DynamoDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have never read these, they are worth your time. They shaped a lot of what later systems and interview discussions borrow from.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Other books
&lt;/h3&gt;

&lt;p&gt;The source also mentions "Designing Distributed Systems" and books focused on Kafka, Flink, or real-time analytics. The take is measured. They can help fill gaps, but DDIA and classic papers give you the stronger base.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn from real production cases
&lt;/h2&gt;

&lt;p&gt;One of the best suggestions in the source is to study production systems from large companies.&lt;/p&gt;

&lt;p&gt;If you work at a company with mature infrastructure, read internal design docs from other teams. If you do not, company engineering blogs and conference talks are the next best thing.&lt;/p&gt;

&lt;p&gt;Good sources include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;company tech blogs from firms like Uber and Dropbox&lt;/li&gt;
&lt;li&gt;InfoQ talks&lt;/li&gt;
&lt;li&gt;architecture talks from companies like Google, Meta, and Amazon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will not always get full schema details. Companies are careful about that. Still, these materials are closer to how systems are actually built than many interview prep articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Be selective with popular prep resources
&lt;/h2&gt;

&lt;p&gt;The original post has opinions here, and they are useful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grokking is okay for basic concepts and the ID generator example, but the rest is not worth much.&lt;/li&gt;
&lt;li&gt;Alex Xu's first book is too shallow.&lt;/li&gt;
&lt;li&gt;The second book has more content, but quality is uneven.&lt;/li&gt;
&lt;li&gt;The "System Design Interview" YouTube channel has a good rate limiter video, but at least one Top K solution is described as outdated enough to fail interviews.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That may sound harsh, but it matches what many engineers eventually learn: a lot of system design content is polished, simple, and incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  What interviews usually care about
&lt;/h2&gt;

&lt;p&gt;Most system design interviews revolve around data.&lt;/p&gt;

&lt;p&gt;A clean way to think about the discussion is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What are the requirements?&lt;/li&gt;
&lt;li&gt;What data do you need to support them?&lt;/li&gt;
&lt;li&gt;What are the size and access patterns of that data?&lt;/li&gt;
&lt;li&gt;How will you store, retrieve, and protect it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is why so many weak system design answers feel off. They jump straight to components like Kafka, Redis, or sharding without first getting the data model and access patterns right.&lt;/p&gt;

&lt;p&gt;A good interview answer should show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reasonable infrastructure choices&lt;/li&gt;
&lt;li&gt;correct data flow&lt;/li&gt;
&lt;li&gt;a clear thought process&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pattern recognition matters, but only after understanding the problem
&lt;/h2&gt;

&lt;p&gt;You will start to notice that many interview questions share structure.&lt;/p&gt;

&lt;p&gt;The source gives one example: group chat and multiplayer card games can have similar data handling patterns. That is a useful observation. Still, pattern matching only helps if you actually understand the data and requirements. Otherwise you end up forcing the wrong template onto the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capacity estimation: interviews vs real work
&lt;/h2&gt;

&lt;p&gt;This distinction is useful.&lt;/p&gt;

&lt;p&gt;At work, capacity planning should be precise enough to support scaling and cost decisions. In interviews, order-of-magnitude estimates are often enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GB or TB?&lt;/li&gt;
&lt;li&gt;thousands or millions of QPS?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those estimates shape your technical choices.&lt;/p&gt;

&lt;p&gt;If you are interviewing for senior roles, being able to do more exact back-of-the-envelope math and tie it to infrastructure choices and cost is a strong signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case studies worth reviewing
&lt;/h2&gt;

&lt;p&gt;The source recommends examples that do not skip schema design, which is a good filter. If the data model is vague, the rest of the architecture is often weak too.&lt;/p&gt;

&lt;p&gt;Examples called out in the post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limiter, especially the well-known YouTube walkthrough&lt;/li&gt;
&lt;li&gt;Chat application case study&lt;/li&gt;
&lt;li&gt;Job scheduling system case study&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rate limiter example is considered solid for interviews, but the source notes a few missing angles, like local rate limiters as safeguards and deeper thinking around CPU or memory-based limits.&lt;/p&gt;

&lt;p&gt;The chat and job scheduling writeups are described as good enough for entry-level interviews, with some flaws but stronger than many articles written by people with more authority and less substance.&lt;/p&gt;

&lt;p&gt;If you want prompts to practice with after reading, PracHub also has a set of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;interview questions here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;System design skill comes from accumulated exposure to real systems.&lt;/p&gt;

&lt;p&gt;Books help. Papers help. Interview case studies help. But the biggest jump happens when you build something, operate it, measure it, and learn what broke.&lt;/p&gt;

&lt;p&gt;That is also the standard you should use in interviews. Your answer should sound like something you would actually build at work, not a guess assembled from buzzwords.&lt;/p&gt;

&lt;p&gt;If you want the original version of these ideas, the source post on PracHub is here: &lt;a href="https://prachub.com/resources/system-design-101?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;System Design 101&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>programming</category>
      <category>tech</category>
    </item>
    <item>
      <title>Most Common Amazon Interview Questions by Role (2026)</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:42:03 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/most-common-amazon-interview-questions-by-role-2026-59f0</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/most-common-amazon-interview-questions-by-role-2026-59f0</guid>
      <description>&lt;p&gt;Amazon runs a different interview loop than most big tech companies. The technical bar matters, but the behavioral bar is unusually high. Every round, including coding and design, checks for Leadership Principles.&lt;/p&gt;

&lt;p&gt;If you are preparing for Amazon, this role-by-role breakdown from PracHub is a good starting point: &lt;a href="https://prachub.com/resources/most-common-amazon-interview-questions-by-role-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Most Common Amazon Interview Questions by Role (2026)&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Amazon interview process looks like
&lt;/h2&gt;

&lt;p&gt;The structure is fairly consistent across roles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Online Assessment (OA)&lt;/strong&gt;&lt;br&gt;
For SDE roles, this is usually 1-2 coding problems. For data roles, expect SQL and analytics-style questions. It is timed, often around 90 minutes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Phone screen&lt;/strong&gt;&lt;br&gt;
Usually one technical question and 1-2 behavioral questions tied to Leadership Principles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Onsite, usually a virtual loop&lt;/strong&gt;&lt;br&gt;
Expect 4-5 rounds, each around 45-60 minutes. Every round includes at least one behavioral question. One interviewer is the &lt;strong&gt;Bar Raiser&lt;/strong&gt;, a trained interviewer from another team who can veto the hire.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point matters. Amazon does not treat behavioral as a warm-up. It is part of the decision in every round.&lt;/p&gt;

&lt;h2&gt;
  
  
  SDE interviews: coding first, behavior in every round
&lt;/h2&gt;

&lt;p&gt;For Software Development Engineer roles, the process is coding-heavy, but behavioral prep is mandatory.&lt;/p&gt;

&lt;h3&gt;
  
  
  What shows up most often in coding rounds
&lt;/h3&gt;

&lt;p&gt;PracHub has 160 Amazon coding questions in its dataset, and the common topics are pretty predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arrays and strings&lt;/li&gt;
&lt;li&gt;Two pointers&lt;/li&gt;
&lt;li&gt;Sliding window&lt;/li&gt;
&lt;li&gt;Trees and graphs&lt;/li&gt;
&lt;li&gt;BFS and DFS&lt;/li&gt;
&lt;li&gt;Lowest common ancestor&lt;/li&gt;
&lt;li&gt;Dynamic programming, usually medium difficulty&lt;/li&gt;
&lt;li&gt;Data structure implementation, such as LRU cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing that catches people off guard is the framing. Amazon often wraps standard problems in practical business scenarios like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;warehouse optimization&lt;/li&gt;
&lt;li&gt;delivery routing&lt;/li&gt;
&lt;li&gt;inventory management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying problem may still be a graph traversal or a sliding window question, but the prompt sounds like an operations problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  System design for SDEs
&lt;/h3&gt;

&lt;p&gt;PracHub lists 48 Amazon system design questions. The recurring themes are very Amazon-shaped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design an order management system&lt;/li&gt;
&lt;li&gt;Design a product recommendation engine&lt;/li&gt;
&lt;li&gt;Design a delivery tracking system&lt;/li&gt;
&lt;li&gt;Design a pricing system with real-time updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not abstract whiteboard exercises. You need to connect technical choices to scale, reliability, latency, and business impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Behavioral topics that come up again and again
&lt;/h3&gt;

&lt;p&gt;PracHub tracks 122 Amazon behavioral questions, and some Leadership Principles show up far more often than others:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer Obsession&lt;/li&gt;
&lt;li&gt;Ownership&lt;/li&gt;
&lt;li&gt;Dive Deep&lt;/li&gt;
&lt;li&gt;Bias for Action&lt;/li&gt;
&lt;li&gt;Deliver Results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interviewers explicitly map your answers to these principles. They take notes on what you demonstrated, then compare impressions across the loop. If your examples are vague, you will feel that quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Scientist interviews: SQL, experiments, and product metrics
&lt;/h2&gt;

&lt;p&gt;Amazon Data Scientist interviews have a different balance. You still need strong behavioral answers, but the technical side leans toward analytics, experimentation, and applied ML.&lt;/p&gt;

&lt;p&gt;PracHub's Amazon set includes 65 SQL questions and 71 ML questions. Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Write a query to calculate customer lifetime value"&lt;/li&gt;
&lt;li&gt;"Design an experiment to test a new recommendation algorithm"&lt;/li&gt;
&lt;li&gt;"How would you detect fraudulent seller accounts?"&lt;/li&gt;
&lt;li&gt;retention analysis&lt;/li&gt;
&lt;li&gt;funnel analysis&lt;/li&gt;
&lt;li&gt;cohort analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Amazon tends to care about in ML rounds
&lt;/h3&gt;

&lt;p&gt;The ML areas called out in the source are tightly tied to Amazon's product and marketplace model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recommendation systems&lt;/li&gt;
&lt;li&gt;fraud detection&lt;/li&gt;
&lt;li&gt;demand forecasting&lt;/li&gt;
&lt;li&gt;NLP for review analysis&lt;/li&gt;
&lt;li&gt;search ranking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is useful because it tells you where to focus. If your prep is centered on generic model trivia, you may miss what Amazon actually asks, applied questions tied to user behavior, marketplace integrity, or retail operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Product sense matters more than many candidates expect
&lt;/h3&gt;

&lt;p&gt;Amazon DS interviews put real weight on product metrics. You need to explain how success is measured and how you would test changes. That means being comfortable with experiment design, tradeoffs in metrics, and the business meaning behind your analysis.&lt;/p&gt;

&lt;p&gt;If you answer with technical detail but cannot define the right success metric, that is a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Engineer interviews: heavy SQL and reliable pipelines
&lt;/h2&gt;

&lt;p&gt;Data Engineer interviews at Amazon are very SQL-heavy. The source is direct about that, and it lines up with what candidates usually report.&lt;/p&gt;

&lt;p&gt;Expect questions around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;complex SQL on large datasets&lt;/li&gt;
&lt;li&gt;query optimization&lt;/li&gt;
&lt;li&gt;data modeling, such as star schema for e-commerce data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design side focuses on data systems, not general backend design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common pipeline design themes
&lt;/h3&gt;

&lt;p&gt;Typical prompts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design an ETL pipeline for order data&lt;/li&gt;
&lt;li&gt;Handle late-arriving data&lt;/li&gt;
&lt;li&gt;Design a data quality monitoring system&lt;/li&gt;
&lt;li&gt;Migrate from batch to real-time processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Amazon cares about scale and reliability here. A clean architecture diagram is not enough. You need to explain what happens when jobs fail, when data arrives late, when retries create duplicates, or when upstream quality drops.&lt;/p&gt;

&lt;p&gt;If you skip failure modes, your answer is incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  What applies to every Amazon role
&lt;/h2&gt;

&lt;p&gt;Some prep advice is role-specific. Some is universal.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prepare 12-15 STAR stories
&lt;/h3&gt;

&lt;p&gt;This is the biggest pattern in Amazon prep. You need a bank of stories mapped to Leadership Principles.&lt;/p&gt;

&lt;p&gt;The source is blunt on this point. It is not optional.&lt;/p&gt;

&lt;p&gt;A lot of candidates prepare hard for coding or SQL, then improvise behaviorals. That is a bad tradeoff for Amazon. Since every round includes behavioral questions, weak stories can sink an otherwise strong loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Be precise with metrics
&lt;/h3&gt;

&lt;p&gt;Amazon is data-driven, and interviewers expect specifics. "We improved performance" is weak. "We cut latency by 28%" is useful.&lt;/p&gt;

&lt;p&gt;The same applies to product work, incident response, project delivery, and system design. Use numbers whenever you can. If your example has no measurable result, it will sound unfinished.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Think in terms of the flywheel
&lt;/h3&gt;

&lt;p&gt;This comes up most often in system design and product discussions. Amazon likes reasoning that connects technical choices to business outcomes through reinforcing loops.&lt;/p&gt;

&lt;p&gt;If your design improves delivery speed, does that improve customer trust, which drives more usage and increases operational efficiency? That style of thinking tends to land well in Amazon interviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Understand what the Bar Raiser is doing
&lt;/h3&gt;

&lt;p&gt;The Bar Raiser is not there to fill a seat for one team. This person is judging whether you meet Amazon's hiring standard overall.&lt;/p&gt;

&lt;p&gt;That usually means close attention to Leadership Principles, quality of judgment, and consistency across rounds. If one round says you show strong Ownership and another suggests the opposite, that will come up in the final discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I would prep, based on this breakdown
&lt;/h2&gt;

&lt;p&gt;If I were targeting Amazon, I would split prep like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a Leadership Principles story bank first&lt;/li&gt;
&lt;li&gt;Practice role-specific technical questions second&lt;/li&gt;
&lt;li&gt;Rehearse answers with numbers, tradeoffs, and clear outcomes&lt;/li&gt;
&lt;li&gt;For design rounds, tie the system back to customer impact and business metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would not prep from random lists alone. Amazon patterns are role-dependent. SDE, DS, and DE loops overlap on behaviorals, but the technical expectations are clearly different.&lt;/p&gt;

&lt;p&gt;If you want to practice against a large role-specific set, PracHub has Amazon questions across coding, behavioral, ML, SQL, and system design here: &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;interview questions on PracHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The useful part is the distribution: 160 coding, 122 behavioral, 71 ML, 65 SQL, and 48 system design questions from Amazon. That makes it easier to focus on what your target role is likely to test instead of studying everything equally.&lt;/p&gt;

&lt;p&gt;For the full role-by-role breakdown, go back to the original PracHub post: &lt;a href="https://prachub.com/resources/most-common-amazon-interview-questions-by-role-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Most Common Amazon Interview Questions by Role (2026)&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>amazon</category>
      <category>interviewprep</category>
    </item>
    <item>
      <title>Machine Learning Interview Questions: Complete 2026 Guide</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:40:02 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/machine-learning-interview-questions-complete-2026-guide-akb</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/machine-learning-interview-questions-complete-2026-guide-akb</guid>
      <description>&lt;p&gt;ML interviews are more practical than they were a couple of years ago.&lt;/p&gt;

&lt;p&gt;You still need to know the classic topics, bias-variance tradeoff, regularization, cross-validation, evaluation metrics. But many interview loops now spend more time on applied questions: how you would build a model for a real product, what features you would choose, how you would evaluate it after launch, and what you would do when offline metrics do not match production behavior.&lt;/p&gt;

&lt;p&gt;This article is adapted from PracHub's &lt;a href="https://prachub.com/resources/machine-learning-interview-questions-guide-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Machine Learning Interview Questions: Complete 2026 Guide&lt;/a&gt;, which is based on a large set of ML interview questions collected by company and role.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ML interviews actually cover
&lt;/h2&gt;

&lt;p&gt;Based on 583 ML questions on PracHub, the distribution looks roughly like this:&lt;/p&gt;

&lt;h3&gt;
  
  
  Fundamentals, 30-40%
&lt;/h3&gt;

&lt;p&gt;This is still the largest bucket. If your basics are shaky, it shows fast.&lt;/p&gt;

&lt;p&gt;Topics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bias-variance tradeoff&lt;/li&gt;
&lt;li&gt;Overfitting and regularization, especially L1 vs L2&lt;/li&gt;
&lt;li&gt;Cross-validation strategies&lt;/li&gt;
&lt;li&gt;Evaluation metrics like precision, recall, F1, and AUC-ROC&lt;/li&gt;
&lt;li&gt;Gradient descent and optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interviewers usually do not stop at definitions. If you say "bias is underfitting and variance is overfitting," expect follow-ups. How would you detect each from training and validation behavior? What changes would you try? Why would regularization help?&lt;/p&gt;

&lt;h3&gt;
  
  
  Applied ML, 25-30%
&lt;/h3&gt;

&lt;p&gt;This part is where many interviews now feel more like product work than classroom theory.&lt;/p&gt;

&lt;p&gt;Common themes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feature engineering for a specific problem&lt;/li&gt;
&lt;li&gt;Model selection, and when to use one class of models over another&lt;/li&gt;
&lt;li&gt;Handling imbalanced data&lt;/li&gt;
&lt;li&gt;Missing data strategies&lt;/li&gt;
&lt;li&gt;A/B testing ML models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might get a prompt like: "Build a churn model for this subscription product." From there, the interviewer wants your full thought process. What is the target? What counts as churn? What data would you collect? Which features are likely to be predictive? What metrics matter to the business?&lt;/p&gt;

&lt;h3&gt;
  
  
  ML system design, 15-20%
&lt;/h3&gt;

&lt;p&gt;This section is hard to avoid for many ML roles.&lt;/p&gt;

&lt;p&gt;Typical prompts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design a recommendation system&lt;/li&gt;
&lt;li&gt;Design a fraud detection pipeline&lt;/li&gt;
&lt;li&gt;Design a search ranking system&lt;/li&gt;
&lt;li&gt;Design an ad click prediction system&lt;/li&gt;
&lt;li&gt;Explain model serving and monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not the same as backend system design, though there is overlap. You need to think through the ML pipeline end to end: data ingestion, feature generation, training, model registry, deployment, serving, monitoring, and retraining.&lt;/p&gt;

&lt;h3&gt;
  
  
  Coding, 10-15%
&lt;/h3&gt;

&lt;p&gt;For most ML interviews, coding is not algorithm-heavy.&lt;/p&gt;

&lt;p&gt;Expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing a simple model from scratch, such as logistic regression or k-means&lt;/li&gt;
&lt;li&gt;Data manipulation with pandas or numpy&lt;/li&gt;
&lt;li&gt;Writing a training loop&lt;/li&gt;
&lt;li&gt;Feature processing code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only practice LeetCode, this round can still catch you off guard. A lot of candidates are weaker in the kind of code they actually write on the job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deep learning, 10-15%
&lt;/h3&gt;

&lt;p&gt;This depends on the role, but deep learning questions are common enough that you should prepare.&lt;/p&gt;

&lt;p&gt;Topics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transformers and attention&lt;/li&gt;
&lt;li&gt;CNNs vs RNNs vs Transformers&lt;/li&gt;
&lt;li&gt;Transfer learning and fine-tuning&lt;/li&gt;
&lt;li&gt;LLM-related questions, which are becoming more common in 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deep learning roles, expect more depth. For general ML roles, interviewers often want a clean explanation of why these architectures differ and where each one fits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Company-specific patterns
&lt;/h2&gt;

&lt;p&gt;The mix changes a lot by company.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon
&lt;/h3&gt;

&lt;p&gt;PracHub has 71 ML questions from Amazon, and the pattern is pretty clear. Amazon is heavy on applied ML.&lt;/p&gt;

&lt;p&gt;You may be asked how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a recommendation system for product pages&lt;/li&gt;
&lt;li&gt;Detect fraudulent reviews&lt;/li&gt;
&lt;li&gt;Optimize delivery routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The style is practical and business-oriented. You need to connect the model to the user problem and the company metric.&lt;/p&gt;

&lt;h3&gt;
  
  
  Meta
&lt;/h3&gt;

&lt;p&gt;Meta has 55 ML questions on PracHub, with a strong focus on ranking, ads, and integrity.&lt;/p&gt;

&lt;p&gt;Expect prompts around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content ranking&lt;/li&gt;
&lt;li&gt;Ads ML&lt;/li&gt;
&lt;li&gt;Harmful content detection at scale&lt;/li&gt;
&lt;li&gt;Balancing engagement with user well-being&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These interviews often push on tradeoffs. A model can improve one metric while hurting another. You should be able to talk through those tradeoffs clearly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google
&lt;/h3&gt;

&lt;p&gt;Google has 36 ML questions on PracHub, and the interviews tend to be more theoretical than Amazon or Meta.&lt;/p&gt;

&lt;p&gt;That usually means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Derivations&lt;/li&gt;
&lt;li&gt;Why an algorithm works&lt;/li&gt;
&lt;li&gt;Mathematical foundations&lt;/li&gt;
&lt;li&gt;ML infrastructure and model serving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You still need applied thinking, but the bar for explaining the underlying mechanics is usually higher.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions that keep coming up
&lt;/h2&gt;

&lt;p&gt;Some questions appear across multiple companies with only minor changes in wording.&lt;/p&gt;

&lt;p&gt;These are worth practicing until your explanation feels natural:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Explain the bias-variance tradeoff. How do you diagnose which one your model suffers from?&lt;/li&gt;
&lt;li&gt;When would you use logistic regression over a random forest?&lt;/li&gt;
&lt;li&gt;Your model has high AUC-ROC but low precision. What is going on? What do you do?&lt;/li&gt;
&lt;li&gt;How would you handle a dataset where 1% of examples are positive?&lt;/li&gt;
&lt;li&gt;Design a recommendation system for a specific product. Walk through the full pipeline.&lt;/li&gt;
&lt;li&gt;How do you decide which features to include in your model?&lt;/li&gt;
&lt;li&gt;Explain L1 vs L2 regularization. When would you use each?&lt;/li&gt;
&lt;li&gt;Your model performs well offline but poorly in production. What could cause this?&lt;/li&gt;
&lt;li&gt;How do you A/B test a machine learning model?&lt;/li&gt;
&lt;li&gt;Explain how a transformer works. Why has it replaced RNNs for most NLP tasks?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you look at that list, the pattern is obvious. Interviewers are checking a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you understand the foundations?&lt;/li&gt;
&lt;li&gt;Can you reason through messy real-world modeling decisions?&lt;/li&gt;
&lt;li&gt;Can you think beyond training accuracy and talk about production behavior?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to prepare without wasting time
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Get sharp on fundamentals
&lt;/h3&gt;

&lt;p&gt;You need to explain core concepts in your own words.&lt;/p&gt;

&lt;p&gt;That means more than memorizing definitions. If someone asks about regularization, you should be able to explain what problem it addresses, how L1 and L2 differ, and what changes you would expect in model behavior. Same for metrics. If an interviewer asks why precision matters more than accuracy in a certain problem, your answer should come quickly.&lt;/p&gt;

&lt;p&gt;A good test is whether you can survive a couple of follow-up questions after your first answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Practice applied case studies
&lt;/h3&gt;

&lt;p&gt;This is where practical experience shows up.&lt;/p&gt;

&lt;p&gt;Take a business problem and walk through it step by step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem formulation&lt;/li&gt;
&lt;li&gt;Data collection&lt;/li&gt;
&lt;li&gt;Feature engineering&lt;/li&gt;
&lt;li&gt;Model selection&lt;/li&gt;
&lt;li&gt;Evaluation&lt;/li&gt;
&lt;li&gt;Deployment&lt;/li&gt;
&lt;li&gt;Monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not jump straight to "I would use XGBoost" or "I would fine-tune a transformer." Start with the problem definition and constraints. A weaker candidate talks tools first. A stronger one frames the task properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Treat ML system design as its own topic
&lt;/h3&gt;

&lt;p&gt;A lot of candidates prepare for theory and forget the pipeline.&lt;/p&gt;

&lt;p&gt;For ML system design, make sure you can talk through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data ingestion&lt;/li&gt;
&lt;li&gt;Feature store&lt;/li&gt;
&lt;li&gt;Training pipeline&lt;/li&gt;
&lt;li&gt;Model registry&lt;/li&gt;
&lt;li&gt;Serving infrastructure&lt;/li&gt;
&lt;li&gt;Monitoring&lt;/li&gt;
&lt;li&gt;Retraining&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should be able to draw this on a whiteboard or explain it verbally without getting lost. The best answers are structured and realistic.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Practice the coding you actually use in ML work
&lt;/h3&gt;

&lt;p&gt;You probably will not get a LeetCode-hard graph problem.&lt;/p&gt;

&lt;p&gt;You are more likely to get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pandas and numpy work&lt;/li&gt;
&lt;li&gt;Basic model implementation&lt;/li&gt;
&lt;li&gt;Training loop logic&lt;/li&gt;
&lt;li&gt;Feature transformation code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means your prep should include notebook-style coding, not just algorithm drills.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better way to use question banks
&lt;/h2&gt;

&lt;p&gt;Grinding random questions is not that useful unless you know what pattern each question is testing.&lt;/p&gt;

&lt;p&gt;A better approach is to group your prep by category:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fundamentals&lt;/li&gt;
&lt;li&gt;Applied ML&lt;/li&gt;
&lt;li&gt;System design&lt;/li&gt;
&lt;li&gt;Coding&lt;/li&gt;
&lt;li&gt;Deep learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then practice answering out loud. For system design and applied ML prompts, force yourself to give complete end-to-end answers.&lt;/p&gt;

&lt;p&gt;If you want a large set of company-tagged practice material, PracHub has a collection of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;ML interview questions&lt;/a&gt; organized by role, company, and difficulty. The same source guide also notes that PracHub has 225 ML system design questions, which is useful because that category is harder to find in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;The main shift in ML interviews is that you need both theory and judgment.&lt;/p&gt;

&lt;p&gt;You still have to know the standard concepts. But that is only the baseline. Strong performance now depends on whether you can connect those concepts to product decisions, production constraints, and model behavior after deployment.&lt;/p&gt;

&lt;p&gt;If you want the original breakdown and source data, read PracHub's full &lt;a href="https://prachub.com/resources/machine-learning-interview-questions-guide-2026?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;Machine Learning Interview Questions: Complete 2026 Guide&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>machinelearning</category>
      <category>interviewprep</category>
    </item>
    <item>
      <title>How to Answer "What is Your Greatest Weakness?" in a Tech Interview</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:38:02 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/how-to-answer-what-is-your-greatest-weakness-in-a-tech-interview-4gn1</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/how-to-answer-what-is-your-greatest-weakness-in-a-tech-interview-4gn1</guid>
      <description>&lt;p&gt;Most candidates still treat "What is your greatest weakness?" like a trap. In tech interviews, it usually isn't. It's a check for self-awareness and humility. Interviewers want to see whether you can name a real weakness, explain how it affects your work, and show that you manage it with a repeatable process.&lt;/p&gt;

&lt;p&gt;The original &lt;a href="https://prachub.com/resources/how-to-answer-what-is-your-greatest-weakness-in-a-tech-interview?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;PracHub guide&lt;/a&gt; gets this right: a good answer has three parts, and the last one matters most.&lt;/p&gt;

&lt;p&gt;If you answer with "I'm a perfectionist" or "I work too hard," you'll sound rehearsed. If you name a weakness that makes you unqualified for the role, you'll hurt yourself. The sweet spot is a genuine, non-critical weakness plus a concrete system that keeps it from hurting your team.&lt;/p&gt;

&lt;h2&gt;
  
  
  What interviewers are actually testing
&lt;/h2&gt;

&lt;p&gt;At companies with structured interview loops, including FAANG-style processes, this question usually comes down to three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-awareness&lt;/li&gt;
&lt;li&gt;Intellectual humility&lt;/li&gt;
&lt;li&gt;Your ability to respond to feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every engineer has blind spots. The interviewer knows that. What they want to learn is whether you can talk about yours without getting defensive or turning the answer into a humblebrag.&lt;/p&gt;

&lt;p&gt;That means your answer should sound honest, specific, and current. You are not confessing failure for drama points. You are showing that you understand how you work.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple framework that works
&lt;/h2&gt;

&lt;p&gt;A strong answer is usually 60 to 90 seconds. Longer than that, and you risk rambling.&lt;/p&gt;

&lt;p&gt;Use this three-step structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. State the weakness directly
&lt;/h3&gt;

&lt;p&gt;Say what the weakness is in plain language.&lt;/p&gt;

&lt;p&gt;A good opening is:&lt;/p&gt;

&lt;p&gt;"In the past, I have struggled with [specific weakness]."&lt;/p&gt;

&lt;p&gt;Keep it clean. Do not apologize. Do not instantly spin it into a strength.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Explain how it showed up in your work
&lt;/h3&gt;

&lt;p&gt;Next, tie the weakness to real engineering work. This is the part many people skip, and that's what makes the answer sound fake.&lt;/p&gt;

&lt;p&gt;Use a pattern like:&lt;/p&gt;

&lt;p&gt;"When I'm working on [type of task], I tend to [negative action], which causes [negative impact]."&lt;/p&gt;

&lt;p&gt;This shows that you understand the cost of the weakness, not just the label.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Spend most of the answer on your mitigation system
&lt;/h3&gt;

&lt;p&gt;This is the part interviewers care about most.&lt;/p&gt;

&lt;p&gt;Do not say, "I'm working on it." Say what you actually do.&lt;/p&gt;

&lt;p&gt;A useful pattern is:&lt;/p&gt;

&lt;p&gt;"To mitigate this, I now [specific system or action]. Since I started doing that, [positive result]."&lt;/p&gt;

&lt;p&gt;The key word here is system. A calendar rule. A design-doc habit. A review process. A communication trigger. A debugging cutoff. Something concrete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three examples for software engineers
&lt;/h2&gt;

&lt;p&gt;These examples work because they are believable and process-driven.&lt;/p&gt;

&lt;h3&gt;
  
  
  Junior engineer: getting stuck too long before asking for help
&lt;/h3&gt;

&lt;p&gt;If you are early in your career, a common weakness is trying to solve every bug alone.&lt;/p&gt;

&lt;p&gt;A solid answer sounds like this:&lt;/p&gt;

&lt;p&gt;"My biggest weakness has been staying stuck on a bug for too long before asking for help. Early in my current role, I would spend two or even three days debugging a pipeline issue because I did not want to interrupt senior engineers. I realized that was slowing down the sprint and making the problem more expensive than it needed to be. To fix that, I use a 'One Hour Rule.' If I am blocked for more than an hour, I write down what I tried and post it in Slack with context. That way I am not asking vague questions, but I am also not failing silently. It has improved how quickly I close tickets."&lt;/p&gt;

&lt;p&gt;Why it works: it is honest, not fatal, and the mitigation is specific.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mid-level engineer: over-engineering simple solutions
&lt;/h3&gt;

&lt;p&gt;This one is common for engineers who care a lot about design.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"In the past, I have had a tendency to over-engineer. On some projects, I would build a more abstract or scalable solution than the requirements justified. That added complexity and slowed delivery on a project where a simpler CRUD implementation would have been enough. To manage that, I now use YAGNI as a hard check before I start coding. I write a short design doc that limits the scope to current business needs, and I ask a peer reviewer to call out any unnecessary abstraction. That has kept my designs more practical without lowering quality."&lt;/p&gt;

&lt;p&gt;Why it works: the weakness is real, but it does not suggest incompetence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Senior or Staff engineer: weak delegation on architecture work
&lt;/h3&gt;

&lt;p&gt;At higher levels, your weaknesses are often about team growth and how work gets distributed.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;"As I moved into a Staff-level role, one weakness I noticed was that I held onto critical architecture work instead of delegating it. I could move fast on those tasks myself, but it created a bottleneck and reduced growth opportunities for mid-level engineers on the team. I changed my process so that I no longer write the first draft of major design docs by default. I assign that draft to another engineer and review it instead. It can take a little longer upfront, but it spreads architectural ownership and removes me as the bottleneck."&lt;/p&gt;

&lt;p&gt;Why it works: it shows maturity, not ego.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four answers that usually fail
&lt;/h2&gt;

&lt;p&gt;Some weaknesses are bad because they sound fake. Others are bad because they raise direct concerns about your ability to do the job.&lt;/p&gt;

&lt;p&gt;Avoid these.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The humblebrag
&lt;/h3&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I work too hard"&lt;/li&gt;
&lt;li&gt;"I'm a perfectionist"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are transparent. They signal dishonesty or weak self-awareness.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The fatal flaw
&lt;/h3&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I hate writing tests"&lt;/li&gt;
&lt;li&gt;"I struggle with basic algorithms"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the weakness cuts into core job skills, it can sink your interview.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The blame answer
&lt;/h3&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I get frustrated when teammates write bad code"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tells the interviewer you may be hard to work with. It suggests low empathy and weak collaboration.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The fixed-trait answer
&lt;/h3&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I'm just naturally disorganized"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This fails because it sounds permanent. The interviewer wants to hear a manageable work habit, not a personality verdict with no plan attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to find a real weakness to use
&lt;/h2&gt;

&lt;p&gt;If you are not sure what to say, look at past feedback.&lt;/p&gt;

&lt;p&gt;Your performance reviews, 1:1 notes, or manager feedback are usually the best source. Focus on constructive feedback you have actually received, then convert it into the three-part framework.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"You need to communicate more during incidents"&lt;/li&gt;
&lt;li&gt;"You should spend more time on documentation"&lt;/li&gt;
&lt;li&gt;"You sometimes go too deep before aligning on scope"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are useful because they are real and specific. Once you add context and a mitigation system, they become strong interview material.&lt;/p&gt;

&lt;p&gt;That is also why generic interview prep often falls flat. You do not need a clever answer. You need an honest one with some process behind it. If you want more prompts to practice this kind of response, PracHub has a useful list of &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;tech interview questions here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does STAR work here?
&lt;/h2&gt;

&lt;p&gt;You can force this answer into STAR, but it is usually awkward.&lt;/p&gt;

&lt;p&gt;STAR is good for behavioral stories with a clear scenario and outcome. "Greatest weakness" is different. It is about an ongoing pattern in how you work. That is why the simpler structure, confession, context, mitigation, works better.&lt;/p&gt;

&lt;p&gt;It keeps you focused on the present-day system, which is what the interviewer actually wants to hear.&lt;/p&gt;

&lt;h2&gt;
  
  
  A good answer has one job
&lt;/h2&gt;

&lt;p&gt;Your answer does not need to impress anyone with drama or polish. It needs to show that you know your weak spots and that you do not leave them unmanaged.&lt;/p&gt;

&lt;p&gt;That is what makes an answer credible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The weakness is real&lt;/li&gt;
&lt;li&gt;It is not disqualifying&lt;/li&gt;
&lt;li&gt;You can explain its effect on your work&lt;/li&gt;
&lt;li&gt;You have a concrete process that keeps it under control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want the original version with the sample answers and breakdown, read the full &lt;a href="https://prachub.com/resources/how-to-answer-what-is-your-greatest-weakness-in-a-tech-interview?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;PracHub post here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>programming</category>
      <category>tech</category>
    </item>
    <item>
      <title>How to Answer 'Tell Me About a Time You Failed' in a Tech Interview</title>
      <dc:creator>Feng Zhang</dc:creator>
      <pubDate>Tue, 05 May 2026 03:36:01 +0000</pubDate>
      <link>https://dev.to/feng_zhang_cedb4581bee881/how-to-answer-tell-me-about-a-time-you-failed-in-a-tech-interview-1n05</link>
      <guid>https://dev.to/feng_zhang_cedb4581bee881/how-to-answer-tell-me-about-a-time-you-failed-in-a-tech-interview-1n05</guid>
      <description>&lt;p&gt;Most candidates overthink "Tell me about a time you failed." They assume the safest move is to soften the story, pick a harmless mistake, or package a "failure" that is secretly a strength.&lt;/p&gt;

&lt;p&gt;That usually backfires.&lt;/p&gt;

&lt;p&gt;In software interviews, especially for experienced engineers, a real failure is often better than a polished non-answer. Hiring managers are trying to figure out whether you can own mistakes, respond well under pressure, and put systems in place so the same issue does not happen twice. The best way to answer is like a blameless post-mortem, turned into a clear interview story.&lt;/p&gt;

&lt;p&gt;This article is adapted from PracHub's guide on &lt;a href="https://prachub.com/resources/how-to-answer-tell-me-about-a-time-you-failed-in-a-tech-interview?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;how to answer "Tell me about a time you failed" in a tech interview&lt;/a&gt;, but rewritten for a developer audience here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What interviewers are actually looking for
&lt;/h2&gt;

&lt;p&gt;This question is less about the failure itself and more about your judgment after it.&lt;/p&gt;

&lt;p&gt;They want to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you admit a real mistake?&lt;/li&gt;
&lt;li&gt;Did you act quickly when things started going wrong?&lt;/li&gt;
&lt;li&gt;Did you hide, deflect, or blame other people?&lt;/li&gt;
&lt;li&gt;Did you learn something specific?&lt;/li&gt;
&lt;li&gt;Did you add a process or safeguard so the same class of mistake does not repeat?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you say you have never failed, that is a red flag. If you give a fake answer like "I cared too much" or "I worked too hard," that is also a red flag. It suggests low self-awareness, low honesty, or not much experience with meaningful responsibility.&lt;/p&gt;

&lt;p&gt;For senior engineers, real failures are normal. Production issues, bad estimates, wrong technical choices, delayed escalation, that all happens in real engineering work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use the blameless post-mortem structure
&lt;/h2&gt;

&lt;p&gt;A strong answer is short, direct, and focused mostly on the lesson and the system change. You should usually keep it under three minutes.&lt;/p&gt;

&lt;p&gt;A simple structure:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Transparent confession
&lt;/h3&gt;

&lt;p&gt;Start with the mistake. Be plain about it.&lt;/p&gt;

&lt;p&gt;Say what happened, what your role was, and what you got wrong. Use "I," not "we," if it was your error.&lt;/p&gt;

&lt;p&gt;Good phrasing sounds like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I made a mistake in a production deployment..."&lt;/li&gt;
&lt;li&gt;"I failed to estimate the integration work correctly..."&lt;/li&gt;
&lt;li&gt;"I chose the wrong technical direction for that service..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not spend a minute building context before you admit the failure. Lead with it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Immediate response
&lt;/h3&gt;

&lt;p&gt;Next, explain what you did when the problem became obvious.&lt;/p&gt;

&lt;p&gt;This tells the interviewer whether you are reliable under pressure. The main question is whether you protected users and the team before protecting your ego.&lt;/p&gt;

&lt;p&gt;That can mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rolling back fast&lt;/li&gt;
&lt;li&gt;escalating early&lt;/li&gt;
&lt;li&gt;joining incident response&lt;/li&gt;
&lt;li&gt;resetting expectations with stakeholders&lt;/li&gt;
&lt;li&gt;admitting the estimate was wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep this part short. The point is that you responded directly and did not hide the issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Systemic fix
&lt;/h3&gt;

&lt;p&gt;This is the part that matters most.&lt;/p&gt;

&lt;p&gt;A weak answer ends after the incident is resolved. A strong answer explains how you fixed the system that allowed the mistake in the first place.&lt;/p&gt;

&lt;p&gt;That system change might be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a new automated test&lt;/li&gt;
&lt;li&gt;a CI/CD check&lt;/li&gt;
&lt;li&gt;a staging improvement&lt;/li&gt;
&lt;li&gt;a design review rule&lt;/li&gt;
&lt;li&gt;a proof-of-concept step before estimation&lt;/li&gt;
&lt;li&gt;a decision framework for architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what makes your answer sound like engineering instead of apology.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three strong examples
&lt;/h2&gt;

&lt;p&gt;Here are three examples from common software engineering situations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production outage
&lt;/h3&gt;

&lt;p&gt;A backend engineer could say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Two years ago, I caused a 15-minute partial outage on our checkout service. I deployed what I thought was a backwards-compatible database schema change, but I missed that an older microservice still depended on strict column ordering. That broke right after deployment.&lt;/p&gt;

&lt;p&gt;As soon as I saw the 500 rate spike in Datadog, I triggered an automated rollback instead of trying to debug it live. I posted in the incident channel that I had caused the issue and focused on restoring service first.&lt;/p&gt;

&lt;p&gt;The bigger problem was that our integration tests were using a mocked database instead of a real schema replica. After the post-mortem, I built a containerized test pipeline that validates schema changes against a production-like clone. Since then, we have not had another deployment issue from that category. The lesson for me was simple: if staging does not match production closely enough, your deployment confidence is fake."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why this works: the candidate owns the outage, responds fast, and spends most of the answer on the process fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Missed deadline
&lt;/h3&gt;

&lt;p&gt;A full-stack engineer could say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I failed to deliver an OAuth integration for a new enterprise client on time. I estimated two weeks because I assumed their Active Directory setup was standard. It was not, and we missed the launch date by more than a month.&lt;/p&gt;

&lt;p&gt;I realized about a week into the sprint that I was blocked, but I made it worse by trying to push through on my own instead of escalating. Once it was clear I would miss the date, I told my manager and the client's solutions architect that my estimate had been wrong and that we needed to reset expectations.&lt;/p&gt;

&lt;p&gt;The lesson was that I was estimating third-party integration work based on documentation, not proof. Since then, I do a short tracer-bullet spike before I commit to a delivery estimate. I use that time to prove the handshake works and the docs are accurate. That small step has made my integration estimates much more reliable."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why this works: it shows ownership, admits bad judgment, and ends with a specific mechanism that changed future behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wrong technical choice
&lt;/h3&gt;

&lt;p&gt;A senior engineer could say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I made the wrong foundational choice for a notification service I was leading. I picked MongoDB because write speed mattered most at the time. About a year later, the product needed relational analytics across notification history, and that database choice became expensive technical debt.&lt;/p&gt;

&lt;p&gt;Once the problem was clear, I wrote a technical brief for the engineering director explaining that my original decision no longer fit the business need. I proposed a migration path to PostgreSQL and led the migration work so the rest of the team would not absorb all the disruption.&lt;/p&gt;

&lt;p&gt;What I changed after that was our design process. For architecture decisions that are hard to reverse, like a primary datastore, I now require a "two-way door" analysis in the design doc. If the choice is hard to unwind, it has to be defended against a longer product horizon, not just the immediate sprint."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why this works: it shows strategic judgment, not just incident handling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistakes that will sink your answer
&lt;/h2&gt;

&lt;p&gt;There are three common ways candidates ruin this question.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shadow blame
&lt;/h3&gt;

&lt;p&gt;Example: "I missed the deadline because QA was slow."&lt;/p&gt;

&lt;p&gt;Even if other people were involved, the interview is about your judgment. Talk about what you could have done differently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fake failure
&lt;/h3&gt;

&lt;p&gt;Example: "My biggest failure was working too hard."&lt;/p&gt;

&lt;p&gt;Nobody believes this. Pick a real mistake with real consequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  No root-cause fix
&lt;/h3&gt;

&lt;p&gt;If your story ends with "then we fixed production," it is incomplete. The interviewer wants the mechanism you added so the same thing does not happen again.&lt;/p&gt;

&lt;p&gt;That is why the post-mortem framing works so well. It moves the answer from confession to engineering judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  How much time to spend on each part
&lt;/h2&gt;

&lt;p&gt;A good rule is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20 to 30 percent on the failure&lt;/li&gt;
&lt;li&gt;20 to 30 percent on the immediate response&lt;/li&gt;
&lt;li&gt;40 to 60 percent on the systemic fix and lesson&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not turn this into a five-minute architecture walkthrough. Keep enough detail for the interviewer to understand the stakes, then get to the lesson.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes a good failure story
&lt;/h2&gt;

&lt;p&gt;A good story is real, professional, and recoverable. It should show that you had enough responsibility to make a meaningful mistake.&lt;/p&gt;

&lt;p&gt;Strong examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a deployment that caused a minor outage&lt;/li&gt;
&lt;li&gt;a project you estimated badly&lt;/li&gt;
&lt;li&gt;a blocker you escalated too late&lt;/li&gt;
&lt;li&gt;a technical decision that aged badly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The failure does not need to be dramatic. It does need to be honest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final advice
&lt;/h2&gt;

&lt;p&gt;Before the interview, write out one story using this format:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What exactly failed?&lt;/li&gt;
&lt;li&gt;What did you do right away?&lt;/li&gt;
&lt;li&gt;What system did you change after the post-mortem?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then practice saying it out loud until it sounds calm and direct.&lt;/p&gt;

&lt;p&gt;If you want more examples and the original breakdown, PracHub's full post on &lt;a href="https://prachub.com/resources/how-to-answer-tell-me-about-a-time-you-failed-in-a-tech-interview?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;answering "Tell me about a time you failed"&lt;/a&gt; is worth reading. You can also browse &lt;a href="https://prachub.com/interview-questions?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=backlinks" rel="noopener noreferrer"&gt;related interview questions on PracHub&lt;/a&gt; to practice other behavioral prompts in the same style.&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>programming</category>
      <category>tech</category>
    </item>
  </channel>
</rss>
