<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: oavoservice-cs</title>
    <description>The latest articles on DEV Community by oavoservice-cs (@oavoservice-cs).</description>
    <link>https://dev.to/oavoservice-cs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3968037%2F28f2d01e-3c9f-4532-940b-0c280ed9768e.png</url>
      <title>DEV Community: oavoservice-cs</title>
      <link>https://dev.to/oavoservice-cs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oavoservice-cs"/>
    <language>en</language>
    <item>
      <title>Meta University 实习 VO辅助 真题分享</title>
      <dc:creator>oavoservice-cs</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:08:05 +0000</pubDate>
      <link>https://dev.to/oavoservice-cs/meta-university-shi-xi-vofu-zhu-zhen-ti-fen-xiang-2dd6</link>
      <guid>https://dev.to/oavoservice-cs/meta-university-shi-xi-vofu-zhu-zhen-ti-fen-xiang-2dd6</guid>
      <description>&lt;p&gt;Meta 的实习 VO 题目不偏怪，但几乎每道题都有一个 Follow-up，逼你把暴力解优化到最优、或在限制条件下重写。这篇复盘两道真实风格的 VO 题——一道字符串子序列计数，一道原地找重复——再附上 Meta 实习的 12 周时间线，让你知道哪几周才真正决定 return offer。&lt;/p&gt;

&lt;p&gt;VO 1：统计 words 中是 s 子序列的个数&lt;br&gt;
给一个主串 s 和一个字符串数组 words，统计 words 里有多少个是 s 的子序列。&lt;/p&gt;

&lt;p&gt;示例：s = "abc", words = ["a", "bb", "acd", "ace"] → 输出 2（"a" 和 "ace" 是子序列）&lt;/p&gt;

&lt;p&gt;基础解法：双指针&lt;br&gt;
对每个 word 用双指针扫一遍：当前字符匹配则两个指针都走，否则只走主串指针。word 指针走到底说明它是子序列。&lt;/p&gt;

&lt;p&gt;def is_subsequence(word, s):&lt;br&gt;
    i = 0  # word 指针&lt;br&gt;
    for ch in s:&lt;br&gt;
        if i &amp;lt; len(word) and word[i] == ch:&lt;br&gt;
            i += 1&lt;br&gt;
    return i == len(word)&lt;/p&gt;

&lt;p&gt;def count_subsequences(s, words):&lt;br&gt;
    return sum(is_subsequence(w, s) for w in words)&lt;br&gt;
复杂度：O(m · n)，m = 所有 word 总长，n = len(s)。&lt;/p&gt;

&lt;p&gt;Follow-up：words 很多、s 很长，如何优化？&lt;br&gt;
优化思路：预处理主串 s，建一个 charIndex——记录每个字符在 s 中出现的所有位置（升序）。对每个 word 的字符，用二分查找在「上一个匹配位置之后」快速定位下一个出现点。&lt;/p&gt;

&lt;p&gt;from bisect import bisect_right&lt;br&gt;
from collections import defaultdict&lt;/p&gt;

&lt;p&gt;def build_index(s):&lt;br&gt;
    idx = defaultdict(list)&lt;br&gt;
    for i, ch in enumerate(s):&lt;br&gt;
        idx[ch].append(i)&lt;br&gt;
    return idx&lt;/p&gt;

&lt;p&gt;def is_subseq_fast(word, idx):&lt;br&gt;
    prev = -1  # 上一个匹配到的位置&lt;br&gt;
    for ch in word:&lt;br&gt;
        positions = idx.get(ch)&lt;br&gt;
        if not positions:&lt;br&gt;
            return False&lt;br&gt;
        # 找第一个 &amp;gt; prev 的位置&lt;br&gt;
        j = bisect_right(positions, prev)&lt;br&gt;
        if j == len(positions):&lt;br&gt;
            return False&lt;br&gt;
        prev = positions[j]&lt;br&gt;
    return True&lt;/p&gt;

&lt;p&gt;def count_subsequences_fast(s, words):&lt;br&gt;
    idx = build_index(s)&lt;br&gt;
    return sum(is_subseq_fast(w, idx) for w in words)&lt;br&gt;
复杂度：预处理 O(n)，每个 word 用二分 O(k · log n)，k = word 平均长度。总计 O(n + m · k · log n)——当 words 很多时远优于暴力。&lt;/p&gt;

&lt;p&gt;口述要点：先讲双指针 baseline，再说「主串固定、要查很多次 → 预处理 + 二分」这个优化动机，面试官最看重这个推导过程。&lt;/p&gt;

&lt;p&gt;VO 2：原地找数组中所有重复元素&lt;br&gt;
长度为 n 的数组 array，每个元素 x 满足 0 ≤ x ≤ n-1。找出数组中所有重复的元素。&lt;/p&gt;

&lt;p&gt;示例：input = [3, 1, 2, 3, 0] → output = [3] Follow-up：不用额外空间、不用递归或函数调用。&lt;/p&gt;

&lt;p&gt;核心思路：原地交换归位&lt;br&gt;
每个元素 x 应该待在下标 x 的位置。遍历数组，把当前元素交换到它该在的位置；如果目标位置已经是正确的值，说明遇到了重复。&lt;/p&gt;

&lt;p&gt;def find_duplicates(array):&lt;br&gt;
    n = len(array)&lt;br&gt;
    res = []&lt;br&gt;
    i = 0&lt;br&gt;
    while i &amp;lt; n:&lt;br&gt;
        x = array[i]&lt;br&gt;
        # x 应该待在下标 x&lt;br&gt;
        if array[i] != i:&lt;br&gt;
            if array[array[i]] == array[i]:&lt;br&gt;
                # 目标位置已是正确值 → 重复&lt;br&gt;
                if array[i] not in res:&lt;br&gt;
                    res.append(array[i])&lt;br&gt;
                i += 1&lt;br&gt;
            else:&lt;br&gt;
                # 交换归位&lt;br&gt;
                array[array[i]], array[i] = array[i], array[array[i]]&lt;br&gt;
        else:&lt;br&gt;
            i += 1&lt;br&gt;
    return res&lt;br&gt;
复杂度：时间 O(n)，空间 O(1)。 关键：每次交换都把一个元素放到正确位置，所以总交换次数是 O(n)——while 循环不会退化成 O(n²)。&lt;/p&gt;

&lt;p&gt;Follow-up 满足点：没用额外空间（原地交换）、没用递归、没用额外函数调用——完全符合限制。&lt;/p&gt;

&lt;p&gt;Meta 实习时间线：哪几周决定 return offer&lt;br&gt;
很多人不知道，Meta 实习的评估其实只看前 10 周。以 12 周实习为例：&lt;/p&gt;

&lt;p&gt;周次  阶段  说明&lt;br&gt;
Week 1-3    Onboarding（入职）  熟悉环境、配 mentor、上手第一个任务&lt;br&gt;
Week 5-6    Mid-cycle review    中期反馈，决定后半程方向&lt;br&gt;
Week 10-11  Final 决策    return offer 在这里定&lt;br&gt;
Week 11-12  收尾放松    评估已结束，享受最后两周&lt;br&gt;
核心提醒：只有 Week 1-10 的表现 会计入实习考核。所以前期就要主动定义清楚项目 scope、和 mentor 高频对齐、Week 5-6 的中期 review 一定要拿到明确反馈并据此调整。&lt;/p&gt;

&lt;p&gt;备考建议&lt;br&gt;
每道题都准备 Follow-up：Meta 几乎必追问优化或加限制，baseline 之后立刻能讲优化方向。&lt;br&gt;
原地操作要练熟：交换归位、快慢指针、位运算标记，这类 O(1) 空间技巧高频出现。&lt;br&gt;
边写边讲推导：面试官给的是「思路分」，不是「AC 分」——讲清楚为什么这样优化比直接写出最优解更重要。&lt;br&gt;
沟通即评分：Meta 看重 communication，卡住时主动说出当前思路，往往能拿到提示。&lt;br&gt;
FAQ&lt;br&gt;
Q1：Meta University 和普通实习 VO 区别大吗？ 题型接近，都是「Medium + Follow-up」。Meta University 更偏基础数据结构 + 双指针 / 数组操作。&lt;/p&gt;

&lt;p&gt;Q2：VO 几轮？ 实习通常 1~2 轮 coding VO（每轮 2 题），外加 behavioral。&lt;/p&gt;

&lt;p&gt;Q3：可以用 Python 吗？ 可以。Meta 不限语言，Python 写双指针 / 原地交换很简洁，适合讲思路。&lt;/p&gt;

&lt;p&gt;Q4：前期表现一般，后面还能翻盘吗？ 能，但要趁早。Week 5-6 的 mid-cycle review 是关键纠偏点，拿到反馈后立刻调整还来得及。&lt;/p&gt;

&lt;p&gt;正在准备 Meta 实习 VO？&lt;/p&gt;

&lt;p&gt;如果你 baseline 能写但 Follow-up 卡壳、原地操作不熟、或想要面试日真人同步陪跑做实时 cue，可以聊聊看完整方案：高频题型精讲 + 限时 mock + &lt;a href="https://oavoservice.com" rel="noopener noreferrer"&gt;vo辅助&lt;/a&gt; + 逐题复盘。&lt;/p&gt;

</description>
      <category>vo辅助</category>
      <category>vo代面</category>
      <category>voassist</category>
      <category>面试辅助</category>
    </item>
    <item>
      <title>Duolingo AI Research Engineer Virtual Onsite Real Interview case share</title>
      <dc:creator>oavoservice-cs</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:06:10 +0000</pubDate>
      <link>https://dev.to/oavoservice-cs/duolingo-ai-research-engineer-virtual-onsite-real-interview-case-share-m6a</link>
      <guid>https://dev.to/oavoservice-cs/duolingo-ai-research-engineer-virtual-onsite-real-interview-case-share-m6a</guid>
      <description>&lt;p&gt;Behind Duolingo's language-learning product sits an engine driven by learning science plus machine learning: which question to surface, how soon a user will forget, and how to retain memory with the least practice are all decided by models. The AI Research Engineer role lives between Research Scientist and ML Engineer—you must read papers and design experiments, yet also turn models into shippable production code.&lt;/p&gt;

&lt;p&gt;This track's virtual onsite is nothing like a standard SDE loop: coding is only one round, while the other three test ML depth, research-design intuition, and the systems skill to embed a model in the product. This article lays out the four-round VO skeletons, the high-frequency follow-ups, and a prep path, based on real feedback from this track.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Duolingo AI Research Engineer VO Overview&lt;br&gt;
Dimension   Details&lt;br&gt;
Rounds  4 VO rounds (coding / ML depth / research design / applied system)&lt;br&gt;
Per round   45-60 minutes&lt;br&gt;
Platform    Video + shared editor (CoderPad-style for coding)&lt;br&gt;
Language    Primarily Python; whiteboard derivations in the ML round&lt;br&gt;
Focus   Engineering + modeling depth + experiment thinking + product delivery&lt;br&gt;
Flow    Recruiter → tech phone screen → four VO rounds → team match&lt;br&gt;
Key insight: the most common failure on this track is not the algorithm, but "can explain the paper yet writes messy code" or "can call the library yet cannot justify the modeling choice." Each round has a distinct focus, and a clear weakness in any single round can be a veto.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Round 1: Coding Implementation (Sequence Sampling)&lt;br&gt;
Problem Statement&lt;br&gt;
Given a vocabulary of length n where each word has weight weights&lt;a href="https://dev.tofrequency"&gt;i&lt;/a&gt;, implement a sampler that draws k distinct words without replacement, proportional to the weights in expectation. After one build, it must support efficient repeated sampling.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Approach&lt;br&gt;
The classic without-replacement weighted sampling is A-Res (exponential jump key): generate key = u^(1/w) for each element (u uniform in (0,1)) and keep the top-k keys. This is equivalent to a weighted top-k, doable in a single O(n) scan with a heap.&lt;/p&gt;

&lt;p&gt;Python Solution&lt;br&gt;
import heapq&lt;br&gt;
import random&lt;/p&gt;

&lt;p&gt;class WeightedSampler:&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self, weights):&lt;br&gt;
        self.weights = weights&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def sample(self, k):
    # A-Res: key = u^(1/w), keep the largest k
    heap = []  # current top-k (key, idx)
    for i, w in enumerate(self.weights):
        if w &amp;lt;= 0:
            continue
        u = random.random()
        key = u ** (1.0 / w)
        if len(heap) &amp;lt; k:
            heapq.heappush(heap, (key, i))
        elif key &amp;gt; heap[0][0]:
            heapq.heapreplace(heap, (key, i))
    return [i for _, i in heap]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Time complexity: O(n log k) Space complexity: O(k) Follow-up: What if weights update dynamically? Answer: a Fenwick tree over prefix sums plus binary-search locate, giving O(log n) per update/sample.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Round 2: ML Depth (Spaced-Repetition Modeling)
Scenario
Duolingo's core is an SRS (Spaced Repetition System): predict the probability a user still recalls a word after interval t, to decide the next review time. The interviewer asks you to design this "memory model" from scratch.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Modeling Approach&lt;br&gt;
The classic baseline is Half-Life Regression: model memory decay as an exponential&lt;/p&gt;

&lt;p&gt;$$p = 2^{-t / h}, \quad h = \exp(\theta \cdot x)$$&lt;/p&gt;

&lt;p&gt;where h is the half-life produced by a linear layer over features x (past correct count, error count, word difficulty, etc.), guaranteed positive. The loss fits both the recall probability p and the half-life h.&lt;/p&gt;

&lt;p&gt;import numpy as np&lt;/p&gt;

&lt;p&gt;def hlr_loss(theta, X, t, recalled, alpha=0.01):&lt;br&gt;
    # X: (N, d) features; t: (N,) intervals; recalled: (N,) 0/1&lt;br&gt;
    h = np.exp(X @ theta)            # half-life, always positive&lt;br&gt;
    p = np.power(2.0, -t / h)        # predicted recall probability&lt;br&gt;
    p = np.clip(p, 1e-6, 1 - 1e-6)&lt;br&gt;
    # main loss: squared error on recall probability + L2&lt;br&gt;
    loss = np.mean((p - recalled) ** 2) + alpha * np.sum(theta ** 2)&lt;br&gt;
    return loss&lt;br&gt;
Why not a plain classifier: the interviewer wants to hear that you understand the half-life h is an interpretable quantity that directly drives scheduling—a black-box classifier cannot output "how long until the next review."&lt;/p&gt;

&lt;p&gt;High-frequency follow-ups:&lt;/p&gt;

&lt;p&gt;Cold start (new user/new word)? → Start with word-level priors plus a population mean.&lt;br&gt;
How to evaluate? → Not accuracy, but calibration of recall probability plus MAE on p.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Round 3: Research Design and Experiments (A/B + Causal)
Scenario
"We want to verify whether shipping a new review-scheduling algorithm improves 7-day retention (D7). Design an experiment."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Breakdown Framework&lt;br&gt;
Metrics: primary is D7 retention; guardrails are daily lessons and review load (avoid buying retention by piling on practice).&lt;br&gt;
Unit and assignment: randomize by user (not session, to avoid contamination); run an A/A test before bucketing.&lt;br&gt;
Sample size: back out N per arm from baseline retention and the minimum detectable effect (MDE), fixing power=0.8, α=0.05.&lt;br&gt;
Novelty effect: scheduling changes carry a novelty effect—observe for ≥2 weeks rather than judging day one.&lt;br&gt;
Causal trap: retention is a survivorship-bias minefield—looking only at active users overstates the effect; use intention-to-treat (ITT) from the assignment point.&lt;br&gt;
from math import sqrt&lt;/p&gt;

&lt;p&gt;def required_n_per_arm(p0, mde, z_alpha=1.96, z_beta=0.84):&lt;br&gt;
    # two-proportion test approximate sample size&lt;br&gt;
    p1 = p0 + mde&lt;br&gt;
    p_bar = (p0 + p1) / 2&lt;br&gt;
    num = (z_alpha * sqrt(2 * p_bar * (1 - p_bar))&lt;br&gt;
           + z_beta * sqrt(p0 * (1 - p0) + p1 * (1 - p1))) ** 2&lt;br&gt;
    return num / (mde ** 2)&lt;/p&gt;

&lt;h1&gt;
  
  
  baseline D7=40%, want to detect +2 percentage points
&lt;/h1&gt;

&lt;p&gt;print(int(required_n_per_arm(0.40, 0.02)))&lt;br&gt;
What the interviewer is really watching: whether you proactively raise guardrail metrics and the novelty effect—just computing sample size is not enough.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Round 4: Applied System Design (Online Inference Pipeline)
Scenario
"Deploy the memory model above as a live service: when each user opens the app, return the 20 words to review today within 50ms. Tens of millions of daily actives."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Design Points&lt;br&gt;
Layer   Approach    Rationale&lt;br&gt;
Features    Offline batch + light online features   Precompute historical stats offline; assemble only real-time features per request&lt;br&gt;
Model serving   Light model (linear/small tree), vectorized batching    A 50ms SLA forbids a heavy model scoring word by word&lt;br&gt;
Scheduling  Maintain each word's "next due time"; only due words enter the candidate pool   Turn "prediction" into "priority-queue top-k"&lt;br&gt;
Storage User-word state in a KV store (half-life, last review time) Reads/writes are per-user, naturally sharded&lt;br&gt;
Offline feedback    Nightly retrain on the day's feedback and update half-lives Closed loop: review result → update h → influences tomorrow's schedule&lt;br&gt;
Core trade-off: push heavy computation offline and let the online path only "fetch due words + lightly rank," so it can sustain 50ms across tens of millions of daily actives.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Four-Round Prep Checklist
Round   Focus   Resources
Coding  Weighted sampling, streaming top-k, reservoir sampling  LeetCode + probabilistic algorithms
ML depth    Memory models, calibration, cold start, interpretability    Duolingo HLR paper + recommender course
Research design Full A/B workflow, guardrails, novelty, ITT Trustworthy Online Experiments
System  Online inference, feature pipeline, scheduling queue    ML system design material&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Preparing for the Duolingo AI Research Engineer virtual onsite?&lt;/p&gt;

&lt;p&gt;Each of the four rounds probes a different weak spot; the hard part is not one problem but keeping all four dimensions—coding, modeling, experiments, systems—from collapsing. If you want batch-specific VO question reconstructions, focused drilling on memory models and experiment design, or VO assistance / VO proxy real-time pacing support, reach out. Send a screenshot of the job description, and we will break down the rounds first, then build a practice plan.&lt;/p&gt;

&lt;p&gt;Add WeChat Coding0201 now to get Duolingo VO real questions and four-round drilling.&lt;/p&gt;

&lt;p&gt;Contact&lt;br&gt;
WeChat: Coding0201&lt;br&gt;
Email: &lt;a href="mailto:catcstech@gmail.com"&gt;catcstech@gmail.com&lt;/a&gt;&lt;br&gt;
Telegram: @OAVOProxy&lt;/p&gt;

</description>
      <category>vo</category>
      <category>voassist</category>
      <category>面试辅助</category>
      <category>sde面试辅助</category>
    </item>
  </channel>
</rss>
