DEV Community: Yang Goufang

你的 AI agent 不笨，是你餵的 context 不行

Yang Goufang — Fri, 29 May 2026 07:04:49 +0000

你叫 agent 加一個功能。它很有自信地寫出一段乾淨的程式碼——用的卻是你半年前就拔掉的套件版本、早就放棄的目錄結構，還有一個這個 repo 從來沒用過的 auth 寫法。它能編譯，但每一個重要的地方都是錯的。

第一個反應，多半是怪模型：「它又在 hallucination 了。」也許吧。但更常見的真相是——以它手上握有的資訊，它做的其實完全合理。問題是，它手上握有的，只是你分心時隨手丟進對話框的一段模糊描述。

所以這篇想講一個有點刺耳的結論：多數時候，不是你的 agent 笨，是你餵的 context 不行。

為什麼是現在

過去很長一段時間，寫軟體最難的部分是「把程式碼寫出來」。這件事正在悄悄地不再成立。現在 agent 產出可用程式碼的速度，常常比我們 review 的速度還快。瓶頸往上游移動了——移到「把意圖講清楚」這件事上：你到底要什麼、有哪些限制、在「這個」codebase 裡怎樣才叫做好。

而我們在這件事上做得很差。我們把餵給 agent 的指令、規則、專案知識當成用過即丟的聊天內容：貼一段 prompt、拿到結果、然後那段 prompt 就永遠消失了。我們從來不會這樣對待自己的原始碼——原始碼我們會版控、會 review、會測試。Patrick Debois（當年不小心造出「DevOps」這個詞的人）講的正是這件事：context 就是新的 code，值得用同樣的工程紀律去對待。他把這套還在成形中的方法稱為 Context Development Lifecycle——像對待軟體一樣，去 generate、evaluate、distribute，並在 production 裡持續 observe。

我覺得這個框架是真的有用。但它也還很早期——比較像一個方向，而不是一條鋪好的路。所以接下來我跳過理論，直接講你明天就能動手做的部分。

一、把知識從腦袋（和對話）裡搬進檔案

槓桿最大的一步：別再把專案知識留在自己腦袋裡、留在聊天紀錄裡，把它寫進 agent 會自動讀取的版控檔案。

多數 agent 工具都支援某種專案指令檔——CLAUDE.md、agent.md、.cursorrules，名字不重要。把它當成一個真正的產物來經營：commit 它、在 PR 裡 review 它，讓它慢慢累積那些「新同事第一天上工會需要知道」的硬知識：

# agent.md

## Stack
- Node 20，TypeScript strict mode。不准用 `any`。
- Postgres 走 Drizzle。但我們「不」用 ORM 內建的 migration 工具——
  migration 都放在 /migrations，用 npm run db:migrate 跑。

## Conventions
- API handler 一律回傳 Result<T>，不准跨邊界 throw。
- 測試用 Vitest，跟原始碼放一起，命名 *.test.ts。

## Don't
- 不要沒問過就加新套件。
- 不要碰 /legacy——那塊已凍結，正在被刪掉。

注意，這些都不是什麼厲害的 prompt 技巧。它們是「事實」——就是你會對一個真人新人講的那些話。好處在於：你只寫一次，往後每一個 session 一開場就是「已經知道」，而不是「重新猜一次」。

二、把規則分層，一層只做一件事

別把所有東西塞進同一個巨大的檔案。像拆 config 一樣，按「適用範圍」把 context 拆開。

全域規則（你做任何事都適用）：你個人的偏好。「講清楚取捨，不要只會附和我。」「能用標準函式庫就不要加新套件。」這些跟著「你」走，跨專案都成立。
專案規則（只限這個 repo）：技術棧、慣例、地雷。這些跟著「程式碼」走。

把兩者分開很重要，因為它們變動的速度與理由不同。你的個人風格相對穩定；專案的架構則會一直變。一旦混在一起，每次某個 repo 做了奇怪的事，你就得去動到你那份「universal 偏好」——然後那個怪癖就會悄悄滲進你「所有」其他專案。一個檔案，一件事。

三、餵事實，不要餵感覺

當你給 agent 的是「可以查證的東西」而不是「請你回想一下」，hallucination 會明顯下降。

「用最新版的 React Router」這種講法，等於請模型去把它訓練時看過的所有版本平均一下。換成「我們用 React Router 7，只走 data router，這是我們在用的三種 pattern：[貼上]」，你給的是 ground truth。來源越具體、越「當下」，它能自由發揮（瞎掰）的空間就越小。

具體來說：

版本講死。寫「React 19」，不要只寫「React」。
任何變動快的東西，直接貼上真正的 API 或文件片段，別賭它記得。
指向真實檔案：「照著 src/handlers/users.ts 的 pattern 寫」勝過用文字描述那個 pattern。

一個可查證的來源，永遠贏過一段很有自信的記憶。

四、把 context 當成有限資源

這一點幾乎每個人都會踩雷。context window 不是無限的，而且——更關鍵的是——「越大不等於越好」。把整個 codebase 全塞進去，不會讓 agent 更聰明；過了某個點，反而更糟：真正相關的訊號被淹沒、模型抓不到重點，輸出品質就這樣悄悄地往下掉。

留意這些徵兆：回答開始偏離你的慣例、反覆問你早就講過的事、很有自信地改錯檔案。這通常不是模型變笨了——是 context 變雜了。

實際該做的事：

察覺到退化。 一個長 session 開始產出變差，那是訊號，不是運氣不好。
compact 後重開。 把真正重要的東西——做過的決定、目前的狀態——濃縮進一個乾淨的新 session。多數工具都有 compact 的機制，刻意去用它，而不是讓一個 session 漫無止境地拖上好幾個小時。
不要預先塞。 context 是「這個任務需要時」才加，不是「以防萬一」先放著。一個聚焦的視窗，勝過一個塞滿的視窗。

把 attention 想成一份預算，只花在跟「這個任務」相關的東西上。

五、告訴 agent 你的環境長怎樣

你的程式碼不是只跑在一個地方。它跑在 local、跑在 CI／integration、也跑在 production——而這幾個環境的差異，往往就是會咬你一口的地方：不同的環境變數、不同的 feature flag、真資料庫對上 mock、某個環境有而另一個沒有的 secret。

這些 agent「全部都不知道」，除非你寫下來。所以，把它寫下來：

## Environments
- local：用 Docker Postgres，MOCK_PAYMENTS=true，跑種子測試資料。
- staging：用真的 Stripe 測試金鑰，schema 跟 prod 一致。
- prod：用真的金鑰。永遠不要在這裡跑破壞性腳本。
        migration 一律要走人工核可才能上。

光是最後那一行，就可能救你一命——免得 agent 興高采烈地對著 production 跑了一個「清理」腳本，只因為從來沒人告訴它 production 是特別的。

六、修種子，不要修果子

這是讓上面所有努力產生複利的那個習慣。

agent 做錯時，你可以直接修「輸出」——改掉那段程式碼、繼續往下走。這修掉了「這一顆」果子。但壞掉的種子還埋在土裡，明天它會再長出一模一樣的錯。

槓桿更高的做法，是去修「指令」。agent 用錯了測試框架？別只是把測試重寫一遍——把「我們用 Vitest，不是 Jest」加進 agent.md。agent 一直去抓某個已棄用的 helper？把它加進「Don't」清單。每一次修正都變成永久的，同樣的錯就不會在往後每個 session 一再出現。

當下慢一點，一個月下來快非常多。你不再是在修輸出，而是在改善那個「產生輸出的東西」。

一點誠實的但書

這一切都還不是定下來的標準。你的 context 檔案還沒有一個 npm test 能跑、沒有公認的 linter 來檢查指令、也沒有 CI gate 會在你的 agent.md 跟現實脫節時亮紅燈。Context Development Lifecycle 是一個有用的視角，不是一套完成的工具鏈——工具都還在即時被發明出來，今天某些「最佳實務」，一年後回頭看大概會覺得很土。

但你不需要等工具鏈成熟，就能把大部分的價值先拿到手。版控的指令檔、分層的規則、可查證的事實、被尊重的 context window，加上「修種子而不是修果子」的紀律——這些今天就能做。它就是「一個一直在跟你作對的 agent」和「一個感覺真的懂你專案的 agent」之間的差別。

你的 agent，很可能比你的 context 願意讓它表現出來的，要好得多。

留一個問題給你

在你「自己」的 agent 指令檔裡，目前最有價值的一行是哪一行——那個讓某個一再發生的錯，從此戛然而止的事實？留言告訴我，好的我想偷來用。

這篇延伸自 Patrick Debois 的 Context Development Lifecycle——他的原文 Optimizing Context for AI Coding Agents 是這個想法更完整的版本。

Your AI Agent Isn't Dumb — Your Context Is

Yang Goufang — Fri, 29 May 2026 07:04:42 +0000

You ask your agent to add a feature. It writes clean, confident code — using a library version you ripped out six months ago, a folder layout you abandoned, and an auth pattern you've never used in this repo. The code compiles. It's also wrong in every way that matters.

Your first instinct is to blame the model. "It hallucinated." Maybe. But more often the model did exactly what it should have given what it knew — and what it knew was a vague paragraph you typed into a chat box while distracted.

Here's the uncomfortable thesis: most of the time your agent isn't dumb. Your context is.

Why this matters now

For years the hard part of building software was writing the code. That's quietly stopped being true. Agents now produce working code faster than most of us can review it. The bottleneck moved upstream — to describing intent. Telling the agent what we want, what the constraints are, and what "good" looks like in this codebase.

And we are bad at this. We treat the instructions, rules, and project knowledge we feed agents as throwaway chat. We paste a prompt, get a result, and lose the prompt forever. We'd never treat our actual source code that way — we version it, review it, and test it. Patrick Debois (the guy who accidentally coined "DevOps") has been making this exact argument: context is the new code, and it deserves the same engineering rigor. He calls the emerging discipline the Context Development Lifecycle — generate it, evaluate it, distribute it, observe it in production, just like software.

I think the frame is genuinely useful. It's also early — more a direction than a paved road. So let me skip the theory and give you the parts you can actually do tomorrow.

1. Get knowledge out of your head and into files

The single highest-leverage move: stop holding project knowledge in your head and your chat history, and put it in versioned files the agent reads automatically.

Most agent tools support a project instruction file — CLAUDE.md, agent.md, .cursorrules, whatever yours calls it. Treat it like a real artifact. Commit it. Review it in PRs. Let it accumulate the hard-won facts a new teammate would need:

# agent.md

## Stack
- Node 20, TypeScript strict mode. No `any`.
- Postgres via Drizzle. We do NOT use the ORM's migration tool —
  migrations live in `/migrations` and run via `npm run db:migrate`.

## Conventions
- API handlers return `Result<T>`, never throw across boundaries.
- Tests use Vitest. Co-locate as `*.test.ts` next to the source.

## Don't
- Don't add new dependencies without asking.
- Don't touch `/legacy` — it's frozen and being deleted.

Notice these aren't clever prompts. They're facts — the same things you'd tell a human on day one. The win is that you write them once and every future session starts informed instead of guessing.

2. Layer your rules — each doing one job

Don't cram everything into one giant file. Split context by scope, the way you split config.

Global rules (apply to everything you do): your personal preferences. "Explain trade-offs, don't just agree." "Prefer standard library over new deps." These follow you across projects.
Project rules (this repo only): the stack, the conventions, the landmines. These follow the code.

Keeping them separate matters because they change at different rates and for different reasons. Your personal style is stable; a project's architecture shifts. When you mix them, you end up editing your universal preferences every time one repo does something weird — and that weirdness leaks into every other project. One file, one job.

3. Feed facts, not vibes

Hallucination drops sharply when you give the agent something checkable instead of asking it to recall.

"Use the latest React Router" invites the model to average over every version it ever saw in training. "We're on React Router 7, data routers only, here are the three patterns we use: [paste]" gives it ground truth. The more specific and current the source, the less room there is to invent.

Concretely:

Pin versions explicitly. "React 19," not "React."
Paste the actual API or doc snippet for anything fast-moving, instead of trusting recall.
Point at real files: "follow the pattern in src/handlers/users.ts" beats describing the pattern in prose.

A checkable source beats a confident memory every time.

4. Treat context as a finite resource

This one trips up almost everyone. The context window is not infinite, and — more importantly — bigger isn't better. Stuffing in your whole codebase doesn't make the agent smarter; past a point it makes it worse. Relevant signal gets buried, the model loses the thread, and output quality quietly degrades.

Watch for the tells: answers that drift from your conventions, repeated questions about things you already established, confident edits to the wrong file. That's usually not the model getting dumber — it's the context getting noisy.

What to actually do:

Notice degradation. When a long session starts producing worse results, that's a signal, not a fluke.
Compact and restart. Summarize what matters — decisions made, current state — into a fresh, clean session. Most tools have a compaction step; use it deliberately instead of letting a session sprawl for hours.
Don't pre-stuff. Add context when it's needed for the task at hand, not "just in case." A focused window beats a full one.

Think of attention as a budget. Spend it on what's relevant to this task.

5. Tell the agent about your environments

Your code doesn't run in one place. It runs locally, in CI/integration, and in production — and those differ in ways that bite. Different env vars, different feature flags, a real database versus a mock, secrets that exist in one place and not another.

The agent knows none of this unless you write it down. So write it down:

## Environments
- local:  uses Docker Postgres, MOCK_PAYMENTS=true, seeded test data.
- staging: real Stripe test keys, mirrors prod schema.
- prod:   real keys. NEVER run destructive scripts here.
           Migrations are gated behind manual approval.

That last line alone can save you from an agent cheerfully running a "cleanup" against production because nobody told it production was special.

6. Fix the seed, not the fruit

This is the habit that makes everything above compound.

When the agent gets something wrong, you can fix the output — edit the code, move on. That fixes this fruit. The bad seed is still in the ground, and tomorrow it grows the same wrong thing again.

The higher-leverage move is to fix the instruction. Agent used the wrong test framework? Don't just rewrite the test — add "we use Vitest, not Jest" to agent.md. Agent keeps reaching for a deprecated helper? Add it to the "don't" list. Each correction becomes permanent, and the same mistake stops recurring across every future session.

It's slower in the moment and dramatically faster over a month. You're not fixing outputs anymore; you're improving the thing that generates outputs.

The honest caveat

None of this is a settled standard. There's no npm test for your context files yet, no agreed-on linter for instructions, no CI gate that fails when your agent.md drifts from reality. The Context Development Lifecycle is a useful lens, not a finished toolchain — the tooling is being invented in real time, and some of today's best practice will look quaint in a year.

But you don't need the mature toolchain to capture most of the value. Versioned instruction files, layered rules, checkable facts, a respected context window, and the discipline to fix the seed instead of the fruit — that's all available today, and it's the difference between an agent that fights you and one that feels like it actually knows your project.

Your agent is probably better than your context is letting it be.

One question for you

What's the single most valuable line currently living in your agent instruction file — the one fact that stopped a recurring mistake cold? Drop it in the comments; I want to steal the good ones.

This builds on Patrick Debois's Context Development Lifecycle — his write-up Optimizing Context for AI Coding Agents is the fuller version of the idea.

AI Weekly — 2026-05-22 to 2026-05-29 | Anthropic's $965B Moment and the Infrastructure Bet

Yang Goufang — Thu, 28 May 2026 23:04:41 +0000

Anthropic closed the largest AI funding round in history — $65 billion at a $965 billion valuation — and dropped Claude Opus 4.8 the same day. Three questions follow: what the money actually buys, what the model actually changes, and whether either matters to the infrastructure layer where enterprise AI is actually decided.

The Funding: What $65B Actually Buys

Anthropic's Series H is not a vote of confidence in a product roadmap. It is a bet on infrastructure positioningAnthropic raises $65B in Series H funding at $965B post-money valuation - Anthropic Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up - The New York Times. At $965 billion post-money, the company is no longer competing solely in the model layer — it is building the substrate other companies build on.

The timing is deliberate: Claude Opus 4.8 shipped the same dayIntroducing Claude Opus 4.8 - Anthropic, and within hours AWS confirmed hostingClaude Opus 4.8 is now available on AWS - Amazon Web Services (AWS). What the money buys is multi-cloud distribution, enterprise procurement relationships, and geographic expansion into Korea and ItalyAnthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening - Anthropic Anthropic opens Milan office to support Italian enterprise, research, and developers - Anthropic before OpenAI's EU footprint matures. AWS certification carries genuine weight in enterprise sales cycles — it means Claude is available through existing procurement frameworks that Fortune 500 IT departments already operate under. That is the infrastructure argument, and it does not require speculating about switching costs: the distribution channel is the switching cost.

For engineering decision-makers: the relevant question is not "is Claude better than GPT?" this week. It is whether Anthropic's infrastructure push — not the model benchmark score — creates durable enterprise relationships that matter in 12–18 months.

Claude Opus 4.8: Capability vs. Distribution

Opus 4.8 ships with claims of improved reasoning and agentic performanceIntroducing Claude Opus 4.8 - Anthropic. AWS hostingClaude Opus 4.8 is now available on AWS - Amazon Web Services (AWS) means enterprise access through existing procurement relationships, which is a meaningfully different go-to-market than OpenAI's direct API.

No independent benchmark data is available at press time. The capability claims should be treated as vendor statements until third-party evaluation is published. The distribution advantage — AWS customers can provision via existing contracts and compliance frameworks — is concrete today.

Google Rewrites the Search Box

For the first time in 25 years, Google has changed the search interface itselfPowered by A.I., Google Changes Its Search Box for the First Time in 25 Years - The New York Times. Not a ranking tweak. A fundamental UI change driven by AI integration. This received less coverage than the Anthropic funding round.

The practical implication: Google is no longer protecting the ranked-list paradigm internally. The search box is becoming an answer engine, which has downstream effects on SEO-driven businesses, content monetization, and how AI summarization interacts with publisher attribution. If you ship products that depend on Google index crawl patterns, this is a structural signal, not a cosmetic one.

The AGI Timeline Returns

Demis Hassabis said AGI is 3 to 4 years awayGoogle DeepMind’s Hassabis: AGI is 3 to 4 years away - Sherwood News. This is the same person who said it was "5 to 10 years away" in 2023. The update is presented as increased confidence, not new evidence.

For technical readers: Hassabis is not publishing a methodology. "AGI" remains undefined across the statements made this week — Anthropic, OpenAI, and Google all use it differently. Treat the 3–4 year claim as a narrative instrument, not an engineering forecast.

The Job Displacement Fault Line

An ex-Meta scientist publicly called Anthropic's CEO "wrong" on claims about AI-driven job lossesOpenAI and Anthropic dig in against each other on AI jobs apocalypse - Axios. This is not an academic debate. It is a dispute about what the economic data actually shows, and it is happening at the CEO level, which means it is affecting policy positioning and public affairs strategy.

The uncertainty here cuts both ways. If job displacement is slower than feared, the talent market implications for AI tooling are different than if it accelerates. The Anthropic/OpenAI public disagreement is a proxy for a genuine forecasting failure — nobody has reliable data on this timeline.

One Number Worth Tracking

Company	Valuation	Runway Implied	Notable This Week
Anthropic	$965B	~3–4 years at current burn	Series H close + Opus 4.8 launch
OpenAI	IPO filing pending	Public market dependent	Sam Altman governance friction cited

The governance issue OpenAI faces — Reuters flagged the "Sam Altman problem"Breakingviews - OpenAI’s IPO has a Sam Altman problem - Reuters — is structurally different from Anthropic's position. A private company with a $965B valuation can defer difficult questions about accountability. A public company cannot. This matters for enterprise customers evaluating vendor stability.

Tool, Not Shrine

AlphaProof Nexus solved 9 Erdős problems and proved 44 sequence conjectures for a few hundred dollars in computeGoogle Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars - the-decoder.com Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and proves 44 sequence conjectures - Crypto Briefing. That is a concrete data point when assessing AI math capability in production workflows. The capability is real. The question — as always — is whether it maps to your actual use case.

OpenAI was named a Leader in enterprise coding agents by GartnerOpenAI named a Leader in enterprise coding agents by Gartner - OpenAI. This is a marketing data point, not a technical evaluation. It tells you about OpenAI's enterprise sales motion, not relative code quality.

This week: Anthropic has the capital, the model, the distribution, and the international footprint. OpenAI has the IPO and the governance problem. Google has the distribution and is redesigning its core product around AI. None of these are the same bet. Pick which layer you are playing in.

AI 週報 — 2026-05-22 to 2026-05-29 | 定價權轉移：Anthropic 估值超越 OpenAI 背後的結構訊號

Yang Goufang — Thu, 28 May 2026 23:02:37 +0000

本週最重要的訊號不是任何單一模型發布，而是 Anthropic 的估值數字開始超越 OpenAI——當定價權從挑選模型的開發者轉移到定義工作流的平台，商業敘事就進入了下一章。

模型與平台：Claude Opus 4.8 登陸 AWS，定價權歸屬出現訊號

本週最大宗的產品新聞是 Claude Opus 4.8 正式在 AWS 上提供Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)。這不是簡單的「又多了一個雲端選項」——AWS 是企業採購事實上的守門人，進入這個管道等於拿到了進入大型企業合規採購流程的正式門票。

結合 Anthropic 成為全球估值最高 AI 新創、估值突破千億美元Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up - The New York Times的背景，兩件事必須一起看：帳面估值是落後指標，AWS 管道承認是領先指標。

Claude Opus 4.8 本身的能力宣稱需要留意「發布 vs 可用 vs 可商用」的三層區分。本次是 Anthropic 直接發布Introducing Claude Opus 4.8 - Anthropic，而非客戶限定 preview；AWS 頁面同步更新Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)表示可用性已達企業交付標準。與前代 Opus 4 的比較基準尚未有公開的第三方評測數據，工程團隊在選型時不應直接用新舊型號的發布文案做為依據。

此外，Anthropic 同步發表了「coding agents 在社會科學領域」的應用論文Coding agents in the social sciences - Anthropic。這屬於研究階段的案例分享，不是產品發布。論文中呈現的 workflow 整合程度、工作流程覆蓋範圍，與實際企業落地所需的穩定性和 tooling 支持，兩者之間還有工程鴻溝。

制度性擴張：米蘭與首爾，歐亞企業市場的網絡效應正在成型

Anthropic 本週宣布兩項幾乎同步的機構佈局：米蘭辦公室服務義大利企業、研究機構與開發者Anthropic opens Milan office to support Italian enterprise, research, and developers - Anthropic；KiYoung Choi 被任命為韓國區代表，首爾辦公室即將開幕Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening - Anthropic。

這兩個佈局動作傳遞的訊息比任何模型能力更新的公告都更持久：

歐洲（米蘭）——義大利是歐洲第三大經濟體，也是 GDPR 框架下企業 AI 採購的複雜合規節點。當地法人的存在將合規對話從「境外服務商」轉為「本土責任實體」，這是企業採購進入合規流程的第一道門檻。

亞洲（首爾）——南韓在半導體供應鏈、手機與消費電子製造、以及汽車業的 AI 整合需求上，存在大量高價值的 B2B 應用場景。辦公室設立是 market entry 的必要條件，不是充分條件；真正的落地進度取決於後續支援與 API 可用性承諾。

橫向對比：Anthropic 這套「進入主要經濟體設立本土存在」的策略，和 OpenAI 兩年前走向公開市場集資的策略，代表兩種不同的市場滲透模型。前者以機構信任為核心，後者以資金槓桿為核心。估值數字的差距現在正在檢驗哪個模型更適合制度性市場。

Altman 的 IPO 問題與 OpenAI 的治理結構風險

本週有兩篇深入報導Breakingviews - OpenAI’s IPO has a Sam Altman problem - Reuters The big questions OpenAI’s trillion-dollar IPO filing may finally answer - Fortune聚焦 OpenAI IPO 申請與 Sam Altman 股權結構的問題。核心張力在於：Altman 不持有 OpenAI 股權——這在公開市場是異常結構，投資人評估治理風險時這是不可忽視變數。

如果 IPO 完成後 Altman 對公司重大決策的影響力缺乏股權基礎的制度性約束，外部董事與投資人的制衡機制將比一般科技公司更為脆弱。監管機構（SEC）在審批時必然會問這個問題The big questions OpenAI’s trillion-dollar IPO filing may finally answer - Fortune。

從工程決策者的角度，這件事的落地意涵是：當你評估基於 OpenAI API 建構的系統時，你同時在假設這家公司的治理結構在 IPO 後不會發生影響 API 可用性的根本性變化。這個假設不是零風險的。

Google 搜尋框 25 年首度改版：核心業務的還擊節奏

《紐約時報》報導 Google 搜尋框在 25 年間首度重大改版，引入生成式 AI 能力Powered by A.I., Google Changes Its Search Box for the First Time in 25 Years - The New York Times。對比上一個週期（2023 年的 BARD 緊急發布），這次是正式產品整合而非失敗回應。

這則新聞的戰略意涵不在於「Google 終於做 AI 搜尋」——那已經是兩年前的判斷；而在於時程：從慌亂緊急應答到正式產品整合，Google 用兩年穩住了核心業務的 AI 升級節奏。這代表搜尋這種高流量、廣觸及的產品的 AI 整合，已經進入可工程化、可維運的階段，不再只是口號或實驗。

對企業決策者言：如果你的產品策略涉及資訊獲取、文件摘要或知識管理，Google 這次改版代表「AI-first 搜尋」已成為標配功能，未來三年的差異化將不在於「有沒有 AI 搜尋」，而在於「誰能做出更高價值的垂直整合」。

信仰與模型：Anthropic 的非技術公關戰線

本週有一個不尋常的新聞維度：Anthropic 共同創辦人 Chris Olah 公開論述教皇良十四世通喻「Magnifica humanitas」Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas" - Anthropic，隨即《科學人》報導 Anthropic 請宗教思想家參與塑造 Claude 的方向Anthropic asks religious thinkers to help shape Claude as pope warns about AI - Scientific American。

這不是技術新聞，但其戰略意圖清晰：當 AI 模型的社會影響力進入制度性監管階段，論述話語權的爭奪就和模型能力同等重要。這個「讓宗教思想家參與 AI 倫理」的框架，與 OpenAI 強調安全與對齊的路線有重疊，但切入角度不同。

從實務觀察：這類非技術論述會影響監管機構的立法方向。立法者在技術細節上依賴業界自我約束時，提供框架的廠商將獲得不成比例的制度影響力。關注監管動態的決策者必須把這種「倫理外交」視為企業風險評估的一環。

本週橫向對比

事件	主要意涵	落地階段
Claude Opus 4.8 登陸 AWS	企業管道合規門票到手	可商用
Anthropic 估值超越 OpenAI	機構信任導向的商業模型獲市場確認	商業階段
米蘭與首爾辦公室設立	歐亞制度性市場進入策略啟動	進入階段
Google 搜尋框改版	搜尋巨頭完成核心業務 AI 整合	可用
OpenAI IPO 結構性風險	治理問題影響 API 長期可用性假設	制度風險
Papal 倫理框架參與	監管話語權競爭進入新維度	論述階段

結語

本週的底層訊號不是新模型能力，而是「誰在控制工作流」這個問題的答案正在形成：Anthropic 的機構滲透策略與 OpenAI 的公開市場路徑，正在測試制度性採納與純市場邏輯兩種不同的滲透模型。實際結果還需要至少兩個季度的營收數據才能確認。

對的工程決策：本週最值得追蹤的不是 Opus 4.8 的 benchmark，而是同一財報季內兩家公司營收增速的差距——這才是進入企業預算審批流程的實際起點。估值是落後指標；營收增速差距才是領先指標。

AI Weekly — 2026-05-15 to 2026-05-22 | The Agentic Inflection Is Real, But the Enterprise Gap Is Wider Than Ever

Yang Goufang — Fri, 22 May 2026 07:15:34 +0000

The agentic AI wave is real. The infrastructure for enterprises to actually run it at scale is not — and that gap is now the only story that matters.

Google I/O: Agentic Gemini Ships, Finally

Google's I/O this week delivered the most concrete agentic AI release since the initial ChatGPT wave. Gemini is now persistently aware, can operate in the background without prompts, and maintains memory across sessionsThe Gemini app becomes more agentic, delivering proactive, 24/7 help - blog.google. The company called it "the agentic Gemini era"I/O 2026: Welcome to the agentic Gemini era - blog.google — and for once, the PR framing is accurate rather than aspirational.

The capability shift: Gemini moves from reactive retrieval to proactive task management. It can monitor your calendar, initiate research, draft communications, and execute multi-step workflows without being asked at each step. This is the "always-on AI assistant" that has been promised for two years — delivered at consumer scale by the one company with the distribution to make it stick.

Also notable: Google unveiled Omni, a world model for AI-generated video with advanced temporal consistencyGoogle debuts new Omni world model at Google I/O with advanced AI video capabilities - mashable.com. Free tier access to new Gemini features was confirmedEvery new tool and AI model from Google I/O you can try for free - mashable.com. These are distinct capability layers — consumer agentic AI shipping now, foundation model research shipping later — and conflating them is exactly the error that leads to overhyped enterprise timelines.

What this means for decision-makers: If your organization has Google Workspace exposure, the agentic Gemini rollout is already in your users' hands. The governance question has arrived before most enterprises had a policy to answer it.

OpenAI: IPO Filing and Internal Realignment

OpenAI filed confidentially for an IPO as early as Friday, May 22OpenAI Could Confidentially File For IPO As Soon As Friday, Report Says - Forbes OpenAI Prepares to File for an I.P.O. in Coming Weeks - The New York Times. This is not a product announcement — it is a structural inflection point. Going public forces financial disclosure, establishes shareholder accountability, and replaces nonprofit governance turbulence with public-market discipline.

The filing follows a week of visible internal reorganization: Greg Brockman officially took control of OpenAI's product organizationGreg Brockman Officially Takes Control of OpenAI’s Products in Latest Shake-Up - WIRED, consolidating technical and product sides under one leader after years of governance instability. The Malta partnership — providing ChatGPT Plus to all citizensOpenAI and Malta partner to bring ChatGPT Plus to all citizens - OpenAI — functions as both a public-relations anchor and a national-scale deployment test.

Engineering read: IPO filing does not change API capabilities or pricing. But it changes vendor risk calculus. Evaluate your OpenAI contract terms with exit clauses that account for structural ownership changes.

Enterprise AI: Deals, Not Demos

Enterprise deployment announcements this week confirm that 2026 is the year AI integration moved from pilot budgets to operational line items.

Dell + CodexOpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments - OpenAI: OpenAI bringing Codex to hybrid and on-premises environments addresses the data residency concern that has blocked enterprise AI adoptions in regulated industries. Proprietary codebases no longer need to leave the building.

Databricks + GPT-5.5Databricks brings GPT-5.5 to enterprise agent workflows - OpenAI: Enterprise agent workflows now have a native path into the Databricks data lakehouse ecosystem. The integration matters because Databricks is where enterprise data lives — putting GPT-5.5 at the data layer rather than the API layer changes the latency and cost profile for data-intensive workflows.

BMS + AnthropicBMS taps Anthropic’s Claude for enterprise-wide AI adoption to speed R&D, global workflows - Fierce Pharma: Bristol-Myers Squibb deploying Claude for enterprise-wide R&D workflows is high-stakes validation in one of the most regulated industries. If this succeeds, expect rapid competitive imitation in biotech and healthcare.

Anthropic + Cloudflare Managed AgentsAnthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve - infoq.com Announcing Claude Managed Agents on Cloudflare - The Cloudflare Blog: Claude agents now run on Cloudflare's edge infrastructure. This targets developers who want Anthropic's model quality with Cloudflare's distribution and billing surface — a distinct channel from the Microsoft-centric enterprise IT buying motion.

Microsoft + EYMicrosoft and EY Team to Promote Corporate AI Adoption - PYMNTS.com and OneStream + MicrosoftOneStream Announces Expanded Strategic Partnership with Microsoft to Scale AI Adoption and Value for the Office of the CFO - PR Newswire: Both announcements reflect the same pattern — professional services and finance software vendors treating Microsoft Copilot as the integration surface. These are ecosystem confirmations rather than AI capability announcements.

The Chip Wall: Geopolitics as Engineering Constraint

Trump stated China has not approved Nvidia AI chip imports, citing China's preference to develop domestic alternativesTrump says China hasn't approved import of Nvidia AI chips because 'they want to develop their own' - Yahoo. Separately, a startup is reframing Nvidia H100/H200 compute as a "boring, bankable asset" for enterprisesThe ‘Price is Right’ for GPUs: The Startup Turning Nvidia Chips Into ‘Boring’ Bankable Assets - The Information — packaging GPU infrastructure as a financial product to derisk enterprise capex decisions.

Practical impact: For AI infrastructure builds outside the US, export restrictions, lead times, and cost uncertainty are active engineering constraints. The startup-as-financial-product model may accelerate enterprise GPU adoption in markets where direct Nvidia procurement is constrained.

AI 週報 — 2026-05-15 to 2026-05-22 | 當 IPO 傳聞撞上 27 萬人部署規模

Yang Goufang — Fri, 22 May 2026 01:35:27 +0000

OpenAI 據報最快本週遞交 IPO 申請，同時 Anthropic 宣布 KPMG 逾 27 萬員工全面採用 Claude——兩起事件從不同角度指向同一個結論：AI 商業化正在從「試點實驗」轉向「制度性採納」。工程整合成本與可靠性數據能否通過這個規模的考驗，是接下來真正有意義的問題。

OpenAI 邁向公開發行：從研究實驗室到華爾街

本週最具新聞份量的事件是 OpenAI 可能在短期內遞交 IPO 申請。Forbes 與紐約時報均於 5 月 20 日報導，OpenAI 已著手準備機密文件，最快於週五送件OpenAI Could Confidentially File For IPO As Soon As Friday, Report Says - Forbes OpenAI Prepares to File for an I.P.O. in Coming Weeks - The New York Times。

IPO 申請本身不代表業務成熟度。 申請機制允許公司在公開說明書送件前與機構投資人進行機密協商，時間點的解讀空間很大。真正有意義的問題是：OpenAI 的營收成長曲線是否足以支撐私募市場已經賦予它的估值？

從監管文件角度，IPO 為何在此時有意義？紐約時報同週報導，馬斯克對 OpenAI 的訴訟剛被法院駁回——陪審團一致裁定 OpenAI 贏得這場官司Jury throws out Musk's lawsuit against OpenAI and Sam Altman - PBS As OpenAI Celebrates Court Win Against Musk, Other Challenges Lie Ahead - The New York Times。訴訟風險移除後，公開發行的監管障礙降低了一項不確定因素。但 Marcus on AI 同日發布的分析指出，OpenAI 近期多項頭條聲稱存在數學基礎瑕疵，呼籲讀者謹慎對待未經驗證的數據Checking the math behind OpenAI and Anthropic’s latest headlines - Marcus on AI | Substack。在看到實際 S-1 檔案前，對估值相關敘事應保持觀望。

另一個組織訊號：Greg Brockman 已正式接管 OpenAI 產品線Greg Brockman Officially Takes Control of OpenAI’s Products in Latest Shake-Up - WIRED，從技術共同創辦人轉向產品營運角色，這是公司從研究為本轉向商業執行的組織訊號，而非技術訊號。

事件	解讀層次
IPO 申請傳聞	發布（非正式公告）
訴訟被駁回	商業風險降低
Brockman 接管產品	組織重心轉移，非技術發布

結論： IPO 進程是高度確定的方向，但實際價值判斷需等 S-1 數據出爐。營收組成與單位經濟是核心觀察指標，而非上市時程。

Anthropic 企業部署規模式的到來

本週第二個重量級新聞是 Anthropic 與 KPMG 達成戰略聯盟，涵蓋 KPMG 全球逾 27.6 萬名員工KPMG integrates Claude across its core business and workforce of more than 276,000 in strategic alliance - Anthropic。同一週，BMS（必治妥施貴寶）也宣布採用 Claude 加速醫藥研發與全球工作流程BMS taps Anthropic’s Claude for enterprise-wide AI adoption to speed R&D, global workflows - Fierce Pharma。

這個數量級有意義嗎？有，但需要條件約束。 27.6 萬人覆蓋不等於 27.6 萬人每天在生產環境使用 AI。企業新聞稿中的「戰略聯盟」與「全企業部署」往往是分階段目標，而非現狀快照。Anthropic 自己的新聞稿未說明實際月活躍用戶或 API 呼叫量，整合成本的實際規模同樣不在公開資訊中。

四大會計事務所的工作流程高度標準化且涉及大量文件審查、稅務分析、稽核編碼——若 KPMG 內部有可驗證的採用率數據出爐，這將是迄今最接近「制度性採納」定義的案例。但在此之前，聲稱這是規模化成功的證明，是把願景敘事當作事實。

同週稍早，Anthropic 發布 Code with Claude 的 Managed Agents 功能，支援主動式工作流程與能力曲線追蹤Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve - infoq.com。「主動式代理」意味著系統根據上下文自動觸發下一步行動，而非等待人類提示。從「工具」到「能動者」的範式轉移成立的前提是：系統在授權範圍內可預測地完成任務。這是可靠性要求最高的部署形態——任何環節失效都會中斷依賴它的工作流鏈。 目前的宣布屬於「發布」層次，企業 API 的錯誤恢復機制與實測數據尚待驗證。

與替代方案比較： Microsoft 同期發布 Work IQ 功能，號稱能追蹤員工與 AI 工具的互動效率How Work IQ is supercharging our AI usage at Microsoft - Microsoft，但其本質是生產力度量工具，不是 AI 能力本身。Anthropic 專注讓 AI 主動行動，Microsoft 專注讓 AI 的行動可被量化——兩家公司正在從不同方向切入同一個問題。

基礎設施更新：NVIDIA CPU 與 Google I/O 代理軍備

NVIDIA Vera：首款專為代理工作負載設計的 CPU

NVIDIA 於 5 月 18 日發布 Vera CPU，據稱是業界首款專為 AI 代理工作負載設計的 CPU，已進入頂級 AI 實驗室部署Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs - NVIDIA Blog。

評估這則新聞的框架：

這是「發布」還是「可用」？NVIDIA 新聞稿通常在晶片發布時同步宣布合作夥伴，但量產時程與大規模供貨是獨立的變數。Vera 目前鎖定的客戶是「AI 實驗室」，不是一般企業——出貨量在初期受限，價格與供應鏈資訊尚未透明。

對代理工作負載的具體優化方向：多代理協作需要高頻、低延遲的任務調度能力，這與傳統 GPU 加速的訓練或推論場景不同。NVIDIA 若能在 CPU 層面解決任務排程瓶頸，對多代理系統的延遲優化有實質意義——前提是實測資料能驗證這個宣稱，而非僅是行銷文件。

Google Gemini I/O 更新：主動式 24/7 代理

Google 在 I/O 2026 第二天宣布 Gemini 應用程式全面代理化，具備主動式、全天候協助能力The Gemini app becomes more agentic, delivering proactive, 24/7 help - blog.google I/O 2026: Welcome to the agentic Gemini era - blog.google。「代理化」的具體定義是：Gemini 能根據使用者行為模式主動預判需求並採取行動。

工程落地角度的質疑： 「24/7 主動幫助」在消費者應用裡是合理的功能描述，但在企業場景中，「主動」與「自動執行」之間存在可靠性鴻溝。企業需要的不是「可能會幫我做事」的系統，而是「在授權範圍內可預測地完成任務」的系統。目前的宣布屬於「發布」層次，真正落地需要看到企業 API 的可靠性數據與錯誤恢復機制——這些資料目前不在公開資訊中。

晶片競合動態：Amazon 逼近、NVIDIA 仍居核心

AWS 與 NVIDIA 的關係正在出現戰略裂縫。digitimes 報導，Amazon 自研 AI 晶片漸獲採用，但 NVIDIA 仍是 AWS 資料中心的核心供應商Amazon's AI chip push gains ground as Nvidia remains central to AWS - digitimes。AIMultiple 的 AI 晶片廠商全景圖顯示，NVIDIA 競爭者名單持續擴張，但多數尚未威脅其訓練與高效能推論的領先地位Top 25+ AI Chip Makers: NVIDIA & Its Competitors - AIMultiple。

供應商	當前態勢
NVIDIA	訓練/HPC 領先，Vera 瞄準代理推理市場
AWS (Trainium/Inferentia)	自研晶片獲内部採用，成本優勢針對特定推論場景
其他競爭者	多數集中在特定垂直場景，尚未規模化威脅 NVIDIA 地位

工程師決策點： 若工作負載是訓練或大規模推論，NVIDIA 生態系仍是最短路徑。若是特定場景的邊緣推論，AWS 自研晶片的成本結構值得評估——不是因為性能超越，而是因為性價比。

一週數據掃描

馬斯克訴訟被駁回Jury throws out Musk's lawsuit against OpenAI and Sam Altman - PBS As OpenAI Celebrates Court Win Against Musk, Other Challenges Lie Ahead - The New York Times：陪審團一致裁定，OpenAI 贏得這場官司。組織控制權爭議已獲法院背書，但上訴權與其他訴訟仍在流程中，智財權與商業模式的長期爭議未必就此落幕。
OpenAI 與馬爾他合作OpenAI and Malta partner to bring ChatGPT Plus to all citizens - OpenAI：ChatGPT Plus 擴展至馬爾他全體公民。「國家級部署」與「實際使用率」是兩件事，這個個案目前僅屬前者。
OpenAI 與戴爾合作OpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments - OpenAI：Codex 進軍混合雲與本地部署企業環境。這是對 Azure 獨家地位的直接挑戰，也是 OpenAI 企業銷售多元化的訊號。
Microsoft AI 主管的 18 個月宣言Microsoft AI chief gives it 18 months—for all white-collar work to be automated by AI - Fortune：宣稱所有白領工作將在 18 個月內被 AI 自動化。白領工作的多樣性與法規限制意味著全面自動化在短期內不可能發生；這類聲稱缺乏具體技術路線圖支撐，更接近公關敘事。

本文所有事實均可於五分鐘內透過公開來源驗證。沒有引用任何未經驗證的匿名消息或內部文件。

Meta GEM 推薦基石模型深剖：大廠限定的工程地獄與落地取捨

Yang Goufang — Thu, 21 May 2026 01:00:54 +0000

Meta GEM 推薦基石模型深剖：大廠限定的工程地獄與落地取捨

Meta 最近揭露了其廣告推薦中央大腦 GEM（Generative Ads Recommendation Model） 的技術細節。公關數據非常漂亮：Instagram 廣告轉換率提升 5%，Facebook Feed 提升 3%，訓練效率是傳統蒸餾的 2 倍。

但站在做落地判斷的工程師視角，這是一篇典型的「大廠資源秀肌肉」報告。它堆砌了萬卡 GPU、自研 NCCLX 通訊與 Jagged Tensor 算子。以下我們直接扒開架構，評估其真正的工程代價與落地可行性。

一、架構設計：解決稀疏性，但代價是推理延遲（Inference Latency）

推薦系統的核心痛點在於特徵空間極度稀疏（Sparse）與行為序列長度不一。GEM 採用了三個關鍵架構來應對：

Wukong（悟空）非序列特徵交叉：利用可堆疊的因子分解機（Factorization Machines）配合 Cross-layer Attention 連接。
- 取捨評估：這能強迫模型學到深層的特徵互動，但多層注意力機制在線上預估（Online Serving）時會帶來災難性的推理延遲（Inference Latency）。
InterFormer 序列與特徵交替學習：傳統方法會把用戶歷史序列先壓縮成一個 Embedding 向量再傳給下游，這會造成行為訊號嚴重流失。InterFormer 選擇交替進行序列學習與跨特徵交互。
- 取捨評估：保留了完整的行為路徑，代價是記憶體頻寬與 Embedding 儲存開銷暴增。
金字塔平行結構（Pyramid-Parallel Structure）：在處理數千長度的用戶行為序列時，透過堆疊平行模組來降低儲存開銷。
- 取捨評估：這能把複雜度壓下來，但越往金字塔上方傳遞，特徵細節的損耗就越大。

💡 落地判斷：

Meta 敢用 Wukong 和 InterFormer，是因為他們把 GEM 定位為離線基石模型（FM），不直接線上 Serving，而是透過知識蒸餾把參數傳給線下的百個小模型（VM - Vertical Models）。如果你的團隊沒有這套 FM-VM 的離線-線上蒸餾架構，直接拿 Wukong / InterFormer 跑線上即時推薦，無異於線上自殺。

二、知識傳遞（Knowledge Transfer）：解決監督過期（Stale Supervision）的妙招

GEM 大模型線下訓練有時間差，而線上 VM 需要應對實時變化的用戶行為。這會導致「監督過期」（Stale Supervision），即 FM 傳給 VM 的知識可能已經過時。

Meta 採用了三個 Post-training 技巧（效率宣稱為傳統蒸餾的 2 倍）：

+------------------------------------------+
|      GEM Foundation Model (Teacher)      |
+------------------------------------------+
                     |
                     v
+------------------------------------------+
|       Student Adapter (即時修正)          | <--- 接入最新 Label 修正偏差
+------------------------------------------+
                     |
      +--------------+--------------+
      | (知識蒸餾)   | (特徵表示)   | (參數共享)
      v              v              v
+------------------------------------------+
|          Vertical Models (VMs)           |
+------------------------------------------+

Student Adapter（學生適配器）：在蒸餾過程中加入輕量級 Adapter，用最新產生的 Ground-truth 標籤實時修正 Teacher 的預估值，拉回時間差造成的預估偏差。
特徵表示學習（Representation Learning）：不只蒸餾機率，更把 Teacher 與 Student 的特徵語意對齊，完全不增加線上推理延遲。
參數共享（Parameter Sharing）：讓小模型直接複用大模型的特定參數，節省運算成本。

💡 落地判斷：

Student Adapter 是整篇報告中最具實用價值的工程細節。任何做離線-線上模型蒸餾的團隊都應該抄這招。它唯一的代價在於你需要一套極度穩定、低延遲的即時標籤回傳（Real-time Labeling Pipeline）。

三、訓練基礎設施：重度定製的軟硬體協同（一般團隊玩不起）

Meta 宣稱在 GPU 數量擴大 16 倍的情況下，跑出 23 倍的有效訓練 FLOPS，MFU（模型 FLOPS 利用率）提升 1.43 倍。

不要只看這張成績單，背後是重度的工程定製：

2D 稀疏平行（2D Sparse Parallelism）：推薦模型跟 LLM 不同，Dense 部分（Attention/MLP）用 HSDP（混合分片分散式平行），而 Sparse 部分（Embedding Table）則是用資料+模型雙重平行。這對節點間的通訊頻寬要求極端苛刻。
Jagged Tensor 自研算子：用戶的序列長度通常是參差不齊的（Jagged）。傳統做法是 Padding 補零，但這在 GPU 運算上是巨大的浪費。Meta 為此編寫了客製化 GPU Kernel，支援不規則序列的算子融合（Computation Fusion）。
NCCLX 通訊庫：這是 Meta 對 NCCL 的自研分支，核心優化是讓通訊集合不佔用 GPU 的 SM 資源，實現計算與通訊的完全重疊（Overlap）。

💡 落地判斷：

這些優化不是在演算法層面，而是在 CUDA 算子和網絡拓撲級別的重度改造。非巨頭企業根本沒有足夠的編譯工程師與硬體專家來開發和維護這套客製化工具鏈。

四、 YangGF 的落地判斷 (Landing Assessment)

維度	評估結論
已發布 (Released)	是。論文與架構細節已揭露。
可用性 (Available)	否。外部開發者無開源模型可用。
可商用性 (Commercially Viable)	極低（僅限廣告超大廠）。只有在廣告變現流水足以覆蓋萬卡 GPU 維護與研發成本的前提下，ROI 才能轉正。
工程瓶頸	1. 訓練通訊頻寬：Sparse 部分對網路 I/O 要求高。 2. 維護代價：FM-VM 的雙層蒸餾 pipeline 極易因標籤異常而崩潰。 3. 客製化工具鏈：Jagged Tensor 算子與 NCCLX 對基礎設施工程能力要求過高。

🛠️ 給架構師的落地建議：

如果你的團隊不是位列一線的流量巨頭，千萬不要盲目跟風去架構一個千億參數的推薦基礎大模型。
我們應該把錢花在刀口上，吸收 GEM 的實用精髓：

抄 Student Adapter 的思路：在你的蒸餾 Pipeline 中加入時效偏差修正，低成本解決離線模型時效性差的硬傷。
參考 Jagged Tensor 算子融合：最佳化你的長行為序列特徵在 GPU 上的運算效率，避免 padding 浪費算力。

🤔 讀者交流時間

各位在實際推薦系統的落地中，如何處理離線大模型（Teacher）與線上小模型（Student）之間的特徵與標籤時效性（Staleness）偏差？你們有嘗試過類似 Student Adapter 的微調機制嗎？

歡迎在下方留言，我們一起探討。

教宗良十四世首道通諭聚焦 AI：當教會成為房間裡的成年人

Yang Goufang — Mon, 18 May 2026 01:14:59 +0000

來源說明：本文為《The Globe and Mail》Pope Leo's first encyclical to focus on use of artificial intelligence（記者 Nicole Winfield / Associated Press，2026 年 5 月 16 日發布）一文的繁體中文摘譯與個人評論。原文版權歸 The Globe and Mail 與 Associated Press 所有。本文不取代閱讀原文。

一句話摘要

教宗良十四世（Pope Leo XIV）剛簽署他任內第一道通諭（encyclical），主題是人工智慧；梵蒂岡同日宣布成立 AI 研究小組。這份文件刻意選在《新事》（Rerum Novarum）通諭發布 135 週年的同一天簽署——當年那份文件是天主教社會思想的奠基之作，回應的是工業革命；這次回應的是 AI 革命。

為什麼這件事值得關注

天主教會全球約 15 億信眾，這是任何一份倫理宣言的潛在受眾規模。但更值得關注的是時間點與框架選擇：

教宗選擇用「通諭」這個天主教會最正式的教義文件形式來處理 AI，而不是一般演說或書信。通諭具有教導權威，會進入教會的長期社會教學體系。
簽署日期刻意對齊《新事》通諭 1891 年 5 月 15 日的發布日。教宗良十四世顯然把 AI 革命定位為與工業革命同等量級的人類處境變動。
內容方向是「以人性尊嚴為核心的倫理進路」（an ethics-based approach that prioritizes human dignity and peace），預計連結到教會社會教學中已有的勞動、正義、和平等議題框架。

換言之，這不是一份應景的公關發言，而是教會把 AI 議題正式納入其千年神學論述的傳統之中。

教宗良十四世的立場輪廓

文章透露出的他個人立場有幾個鮮明特徵：

數學主修出身、會滑手機——文章特別點出他並非對科技疏離，這讓他的批判更難被歸類為「不懂技術的宗教保守派」。
奧斯定會（Augustinian）背景——他特別憂慮生成式 AI 透過 deepfake 進行假訊息與欺騙的能力，因為「對真理的追求」是奧斯定靈修傳統的核心。
對神父的內部規定——他已警告神父不要用 AI 寫講道辭。
戰爭中的 AI 是紅線——他公開譴責烏克蘭、加薩、黎巴嫩、伊朗等戰場上自動化武器系統的擴散，稱之為「戰爭與新科技關係中的非人化演進，一個毀滅性的螺旋」。

2025 年 6 月對一場 AI 會議的演講中，他承認生成式 AI 對醫療與科學發現的貢獻，但同時提問：AI 對「人類對真理與美的開放性、我們把握現實的獨特能力」可能造成什麼後果？

美國視角：與 Trump 政府的張力

這份通諭的發布幾乎注定會成為梵蒂岡與 Trump 政府之間的新摩擦點。原文點出的對比很尖銳：

教宗：芝加哥出生，強調人性尊嚴與國際倫理規範。
Trump 政府：明確拒絕對 AI 的國際監管努力，並在國內移除減緩 AI 發展的官僚障礙，把 AI 視為國家經濟與安全戰略的核心。

報導時，Trump 正結束中國行，同機的有 Elon Musk（旗下 X 平台搭載 AI 聊天機器人 Grok）與 Nvidia 執行長 Jensen Huang（剛獲准向中國銷售 H200 AI 晶片）。這個畫面本身就是對「以人性尊嚴為核心」的另一種答案。

來自學界的兩段引述（值得記住的框架）

Notre Dame 大學哲學教授 Meghan Sullivan：

「我認為天主教會在很多方面將會成為這些 AI 整合進社會的辯論裡，那個房間裡的成年人。教宗無疑會成為人性尊嚴最有力的倡議者之一。」

University of St. Thomas（Houston）神學教授 Thomas Harmon：

「全世界有將近 15 億天主教徒，光這個數字就值得關注。但除了數字之外，天主教會對於『成為人意味著什麼』有著深厚而細膩的思考傳統。」

「房間裡的成年人」這個說法值得在 AI 倫理討論中記下來——它隱含的判斷是：科技公司在做技術決策，政府在做戰略決策，但有資格代表「整體人類處境」發言的玩家很少，而教會自認是其中一個。

教會 AI 倫理路徑的時間線（個人整理）

把原文散落的事件拉成時間軸，可以看出教廷是有系統地累積 AI 立場，而非臨時起意：

2020：梵蒂岡發起 Rome Call for AI Ethics，IBM、Microsoft、Cisco 等公司簽署。核心原則包含包容性、可問責性、不偏頗、隱私。
2024：教宗方濟各在 G7 高峰會發表 AI 專題演說，呼籲禁止「致命自主武器」（killer robots），並主張需要國際條約來規範 AI。
2024：歐盟通過 AI Act，以風險分級方式規範 AI。
2025：聯合國通過新的 AI 治理架構（在英、韓、法等國 AI 峰會只產生非約束性承諾之後）。
2025/06：教宗良十四世首次對 AI 會議發表演說。
2026/05/15：良十四世簽署 AI 主題通諭。
2026/05/16：梵蒂岡宣布成立內部 AI 研究小組；通諭預計數週內公開發布。

個人評論：通諭會比監管文件更有影響力嗎？

技術圈傾向用「監管 vs 創新」的二元軸來看 AI 治理。但這份通諭如果真的進入教會社會教學的傳統，它的影響時間尺度與機制都不同：

監管文件如 EU AI Act 影響的是合規成本與市場准入。
通諭影響的是幾世代之內，全球 15 億人關於「AI 該如何使用」的道德直覺。

工業革命的歷史證據是：《新事》通諭一個世紀後，仍然是天主教社會教學的引用源頭——對勞動法、最低工資、工會權利的討論至今帶有它的印記。如果良十四世的 AI 通諭擁有類似的耐久性，那麼它的目標受眾就不是 2026 年的政策辯論，而是未來幾代人對 AI 的默認態度。

對 AI 行業而言，這意味著一件事：在「AI 該被允許做什麼」這個問題上，長期取勝的不是 EU 委員會、不是科技公司、也不是國家戰略，而是那些有能力塑造世代直覺的機構。教會是其中之一。

通諭公開發布後值得追蹤幾個具體問題：

致命自主武器的立場：是否延續方濟各的明確禁令？
勞動取代議題：是否會明確區分「AI 補充人類勞動」vs「AI 取代人類勞動」？
環境成本：原文已提到梵蒂岡關注資料中心的能源與用水。通諭會否進入「AI 與被造界完整性」（integrity of creation）的神學框架？
真理與 deepfake：奧斯定靈修對真理的強調，會否轉化為對生成式 AI 內容標示的具體呼籲？

這四個議題會決定通諭是「一份普世性的倫理呼籲」還是「一份對 AI 行業有具體操作含義的文件」。在數週內見分曉。

原文連結：Pope Leo's first encyclical to focus on use of artificial intelligence — The Globe and Mail

AI Weekly — 2026-05-08 to 2026-05-15 | OpenAI the Consultant, Anthropic the Platform — Model Companies Pivot Collectively

Yang Goufang — Fri, 15 May 2026 02:28:18 +0000

One-sentence takeaway this week: OpenAI is becoming a consulting firm, Anthropic is becoming a platform company — both have simultaneously abandoned the "model-as-product" narrative.

Model Companies Pivot Collectively: From API Sales to Institutional Resources

$14 billion — that is the valuation OpenAI has assigned to its newly formed "Deployment Company," reportedly derived from external funding discussions in the same weekOpenAI launches the OpenAI Deployment Company to help businesses build around intelligence - OpenAI OpenAI launches AI consulting arm valued at $14 billion - Axios. By contrast, the flagship API business that underpins the company's valuation has never received such external validation. The market is telling OpenAI: your most valuable asset isn't the model, it's delivery capability.

On 5/11 OpenAI announced the formation of the Deployment Company; the same day Axios reported the division's valuation. Simultaneously, OpenAI's head of revenue, Dresser, told CNBC that enterprise AI adoption has reached a "tipping point" — but he wasn't referring to demand explosion, rather delivery complexityOpenAI revenue chief Dresser says enterprise AI adoption is 'at a tipping point' - CNBC. This aligns with the Accenture Federal Services partnership for federal government work — involving security compliance and legal constraints — and the Fiserv partnership for financial institutions, forming a single narrative: enterprises don't want APIs, they want someone to turn APIs into compliant, reliable, explainable systemsFiserv Forms Strategic Collaboration with OpenAI to Bring AI to How Fiserv Serves Financial Institutions - Fiserv Accenture Federal Services and OpenAI Partner to Accelerate Secure AI Adoption Across the Federal Government - Accenture.

Anthropic's response took a different path: ecosystem lock-in.

This week Anthropic released Claude for Small BusinessIntroducing Claude for Small Business - Anthropic, targeting the small and medium-sized market segment that had been previously overlooked. Simultaneously, a deeper AWS integration launched Claude Platform — natively deployed through AWS accounts, opening the trust chain of enterprises' existing cloud infrastructureIntroducing Claude Platform on AWS: Anthropic's native platform, through your AWS account - Amazon Web Services (AWS). More notable still: over 20 legal domain connectors and 12 practice-area plugins released the same weekAnthropic Goes All-In on Legal, Releasing More Than 20 Connectors and 12 Practice-Area Plugins for Claude - LawSites, covering specific use cases such as e-discovery, contract analysis, and regulatory compliance. This isn't general-purpose capability — this is institutionalization of industry knowledge.

	OpenAI	Anthropic
This week's focus	Consulting services (Deployment Company, $14B valuation)	Ecosystem lock-in (AWS native integration, legal plugins)
Business logic	Turning technology into deliverable projects	Turning models into embeddable workflows
Risk	Gross margin diluted by services business	High migration cost for vertical use cases

Apple × OpenAI Rift: The Dissipation of the Integration Dividend

Bloomberg reported this week that the Apple-OpenAI alliance is deteriorating, potentially heading toward legal disputeApple-OpenAI Alliance Frays, Setting Up Possible Legal Fight - Bloomberg.com, and Reuters confirmed that same afternoon that OpenAI is exploring legal optionsOpenAI explores legal options against Apple, source says - Reuters.

This rift validates a structural problem: embedding AI into the OS layer does not create durable differentiation. Apple needed differentiation; OpenAI needed distribution. Both parties' interests overlapped during the honeymoon period, but once the distribution problem was solved, Apple discovered it had not gained meaningfully more model capability than its competitors — users want AI itself, not "Apple-branded AI."

Implication for engineering decision-makers: integration does not create moats. When selecting integration partners, the question to ask is "who controls the model iteration cadence," not "whose devices run fastest."

Anthropic's Long-Term Play: The 2028 Scenarios and the Gates Partnership

Anthropic also released two long-term framework documents this week: a $200 million partnership with the Bill Gates FoundationAnthropic forms $200 million partnership with the Gates Foundation - Anthropic, and the "2028: Two Scenarios for Global AI Leadership" report2028: Two scenarios for global AI leadership - Anthropic. The former is resource allocation; the latter is narrative positioning.

The specific details of the $200 million Gates Foundation partnership have not been fully disclosed, but given Anthropic's recent cadence of publications on safety and governance, the funds are primarily directed toward AI safety research and applications in global health and development — not product R&D. This signals that Anthropic is building a narrative framework broader than commercial products: positioning itself as an institutional player capable of dialogue with sovereign nations, foundations, and academia, rather than merely a model API company.

The "2028 Scenarios" report attempts to define the pathways for AI development — a form of narrative positioning, establishing dialogue frameworks ahead of regulators and policymakers. Similar strategies are visible in major vendors that have published AI ethics guidelines, but Anthropic chose the form of a "decade-long prediction" rather than a "principles declaration" — more ambiguous language, but longer reach.

Codex Windows Sandbox: Security Is Not a Feature, It Is a Cost

OpenAI released two technical articles this week on safely deploying Codex in Windows environmentsRunning Codex safely at OpenAI - OpenAI Building a safe, effective sandbox to enable Codex on Windows - OpenAI, focused on establishing a secure sandbox environment enabling AI agents to execute code within enterprise Windows environments.

There is a fundamental contradiction between the highly privileged state of Microsoft enterprise environments and the unpredictable behavior of AI agents. OpenAI's response is "we will build an isolation layer" — which means that when models enter enterprise workflows, code execution security is no longer optional, it is a rigid requirement factored into deployment costs. Currently, when enterprises evaluate AI vendors, sandbox construction costs are often not listed separately — yet this cost precisely measures the real distance between "model capability" and "delivery capability."

Question for CTOs: If your vendor tells you that you need to build your own sandbox environment to safely use their model, has that engineering cost been accounted for?

Regulatory Pressure: The Legal Vector Is Accelerating

Sam Altman testified this week in the Elon Musk lawsuitOpenAI's Sam Altman takes the stand to fend off Elon Musk's accusations he 'stole a charity' - NPR, and a wrongful death class action against OpenAI is testing new litigation strategiesWrongful Death Lawsuits Against OpenAI Test a New Strategy - The New York Times.

The "novelty" of this class action strategy lies in it being neither patent infringement nor breach of contract — it attempts to pursue AI decision consequences through a tort law framework. If this litigation path establishes any degree of precedent in the future, it will have profound implications for product launch decisions across all AI vendors — not just OpenAI.

An Underappreciated Technical Development: Indicator-Based UI Experiments

Google DeepMind published research this week presenting a concrete problem: "reimagining the mouse pointer"Shaping the future of AI interaction by reimagining the mouse pointer - Google DeepMind: in the era of multimodal models, does the human-AI interaction interface still need to conform to the WIMP paradigm (Windows, Icons, Menus, Pointer) designed in the 1960s?

This research remains at the publication stage, at least one major revision cycle away from a shippable product, but it signals Google's long-term bet on perception-driven interface redesign. If whoever redefines the pointer defines the next generation UI standard, then model capability competition will be replaced by interface competition — and interface standard-setting authority belongs to institutional players, not pure technical teams. This framework echoes the moves by OpenAI and Anthropic this week: hardware and model release cadences are being caught up to by institutional and integration velocity.

This week's conclusion: OpenAI discovered its most valuable asset is not the model but deployment capability; Anthropic discovered its deepest moat is not capability but the speed of encapsulating industry knowledge. Both directions point to the same conclusion: the next bottleneck in AI is not how powerful models become, but who can fastest turn models into indispensable links in workflows.

AI 週報 — 2026-05-08 to 2026-05-15 | OpenAI 做顧問、Anthropic 做生態，模型公司集體轉向

Yang Goufang — Fri, 15 May 2026 02:10:50 +0000

本週一句話：OpenAI 正在變成一家顧問公司，Anthropic 正在變成一家平台公司——兩者都不約同時放棄了「模型即產品」的故事。

模型公司集體轉向：從 API 銷售到制度性資源

140 億美元——這是 OpenAI 對其新建「Deployment Company」的估值，聲稱來自同一週的外部融資談判OpenAI launches the OpenAI Deployment Company to help businesses build around intelligence - OpenAI OpenAI launches AI consulting arm valued at $14 billion - Axios。對比之下，支撐這家公司估值基礎的旗艦 API 業務從未獲得過這樣的外部估值肯定。市場在告訴 OpenAI：你最值錢的資產不是模型，是交付能力。

5/11 OpenAI 宣佈成立 Deployment Company，同日 Axios 報導該部門估值。同一時間，OpenAI 營收負責人 Dresser 對 CNBC 表示企業 AI 採用已達「臨界點」，但他所指的不是需求爆發，而是交付複雜度OpenAI revenue chief Dresser says enterprise AI adoption is 'at a tipping point' - CNBC。這與 Accenture Federal Services 的聯邦政府合作（涉及安全合規與法規約束）和 Fiserv 的金融機構合作形成同一敘事：企業要的不是 API，而是有人幫他們把 API 變成合規、可靠、可解釋的系統Fiserv Forms Strategic Collaboration with OpenAI to Bring AI to How Fiserv Serves Financial Institutions - Fiserv Accenture Federal Services and OpenAI Partner to Accelerate Secure AI Adoption Across the Federal Government - Accenture。

Anthropic 的回應則是另一條路：生態鎖定。

本週 Anthropic 發布 Claude for Small BusinessIntroducing Claude for Small Business - Anthropic，瞄準過去被忽略的中小型市場；同時與 AWS 深度整合推出 Claude Platform——原生透過 AWS 帳戶部署，打通了企業既有雲端基礎設施的信任鏈Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account - Amazon Web Services (AWS)。更值得注意的是同週的 20+ 法律領域連接器與 12 個實務插件發布Anthropic Goes All-In on Legal, Releasing More Than 20 Connectors and 12 Practice-Area Plugins for Claude - LawSites，覆蓋電子發現、合同分析、監管合規等具體場景。這不是通用能力，這是行業知識的制度化。

	OpenAI	Anthropic
本週重心	顧問服務（Deployment Company, $14B 估值）	生態鎖定（AWS 原生整合、法律插件）
商業邏輯	把技術變成可交付的專案	把模型變成可嵌入的工作流程
風險	毛利率被服務業務稀釋	垂直場景遷移成本高

Apple × OpenAI 裂痕：整合紅利的消散

Bloomberg 本週報導 Apple 與 OpenAI 的聯盟正在惡化，可能走向法律糾紛Apple-OpenAI Alliance Frays, Setting Up Possible Legal Fight - Bloomberg.com，Reuters 午後跟進證實 OpenAI 正在探詢法律選項OpenAI explores legal options against Apple, source says - Reuters。

這個裂痕印證了一個結構性問題：把 AI 整合進 OS 層並不能創造持久的差異化。Apple 需要差異化，OpenAI 需要分發，雙方的利益在蜜月期重疊，但當分發問題解決後，Apple 發現自己並沒有因此獲得比競爭對手更多的模型能力——用戶要的是 AI 本身，不是「蘋果牌 AI」。

對工程决策者的啟示：整合並不能創造護城河。選擇整合夥伴時，需要問的是「誰控制模型迭代的節奏」，而不是「誰的設備跑得最快」。

Anthropic 的長期策略：2028 場景與 Gates 合作

Anthropic 本週同時發布兩個長期框架文件：與 Bill Gates 基金會的 2 億美元合作Anthropic forms $200 million partnership with the Gates Foundation - Anthropic，以及「2028：全球 AI 領導力的兩個場景」報告2028: Two scenarios for global AI leadership - Anthropic。前者是資源佈局，後者是話語權佈局。

2 億美元的 Gates Foundation 合作具體內容未完全透明，但結合近期 Anthropic 在安全與治理方向的發文節奏，判斷這筆資金主要流向 AI safety research 與全球健康/發展領域的應用，而不是產品研發。這意味著 Anthropic 正在建立一個比商業產品更寬廣的敘事框架：自己是能與主權國家、基金會、學術界對話的制度性玩家，而不只是一家模型 API 公司。

「2028 場景」報告則試圖定義 AI 發展的路徑框架——這是一種敘事搶位，搶在監管機構和政策制定者之前建立對話框架。類似的策略可見於過去制訂 AI 倫理指南的各大廠，但 Anthropic 選擇用「十年預測」而非「原則宣言」的形式，語言更模糊但射程更遠。

Codex Windows 沙箱：安全不是功能，是成本

OpenAI 本週發布了兩篇關於 Codex 在 Windows 上安全部署的技術文章Running Codex safely at OpenAI - OpenAI Building a safe, effective sandbox to enable Codex on Windows - OpenAI，聚焦於如何建立一個安全的沙箱環境讓 AI 代理在企業 Windows 環境中執行代碼。

微軟企業環境的高度特權狀態與 AI 代理的不可預測行為之間存在根本矛盾。OpenAI 的回應是「我們會建一個隔離層」——這意味著當模型要進入企業工作流程，代碼執行安全不再是可選功能，而是必須計入部署成本的剛性需求。目前企業在評估 AI 供應商時，沙箱建置成本往往未被獨立列出，而這筆成本正好衡量了「模型能力」與「交付能力」之間的真實距離。

對 CTO 的問題：如果你的供應商告訴你需要自建沙箱環境才能安全使用他們的模型，這筆工程成本算進去了嗎？

監管壓力：法律向量正在加速累積

Sam Altman 本週在馬斯克訴訟中作證OpenAI’s Sam Altman takes the stand to fend off Elon Musk’s accusations he ‘stole a charity’ - NPR，以及針對 OpenAI 的 wrongful death 集體訴訟正測試新的訴訟策略Wrongful Death Lawsuits Against OpenAI Test a New Strategy - The New York Times。

集體訴訟策略的「新」在於它不是專利侵權或契約違約，而是嘗試用侵權法框架追究 AI 的決策後果。如果這種訴訟路徑在未來能建立任何程度的先例，將對所有 AI 廠商的產品發布決策產生深遠影響——不只是 OpenAI 受傷。

一個被低估的技術動態：指標式 UI 實驗

本週 Google DeepMind 發布 research 提出「重新想像滑鼠指標」的具體問題Shaping the future of AI interaction by reimagining the mouse pointer - Google DeepMind：在多模態模型時代，人與 AI 的互動介面是否仍然需要服從 1960 年代設計的 WIMP 範式（Windows, Icons, Menus, Pointer）？

這組 research 目前停留在發布階段，距離可用產品至少還有一個 major revision cycle，但它暗示了 Google 對未來人機介面的長期赌注——感知驅動的界面重設計。如果未來誰能重新定義指標誰就能定義下一代 UI 標準，那麼模型能力的競爭將被介面競爭取代，而介面標準的制訂權屬於制度性玩家，不屬於純技術團隊。這個框架與本週 OpenAI 和 Anthropic 的走向形成呼應：硬體與模型的發布節奏正在被制度和整合速度追上。

本週結論：OpenAI 發現自己最值錢的資產不是模型而是部署能力，Anthropic 發現自己最深的護城河不是能力而是行業知識的封裝速度。兩個方向都指向同一個結論：AI 的下一個瓶頸不在於模型有多強，而在於誰能最快把模型變成工作流程中不可繞過的一環。

AI Weekly — 2026-05-08 | MS-OpenAI loosens, and the race moves to control

Yang Goufang — Fri, 08 May 2026 01:51:07 +0000

One-line summary: The most important story of the last two weeks was not another model getting slightly better. It was the Microsoft-OpenAI boundary being redrawn. AWS, FedRAMP, PwC, ChatGPT ads, Claude vertical agents, and Gemini's scene-by-scene expansion all point to the same shift: AI companies are turning model capability into control over deployment, compliance, workflow, cost, and monetization.

1. Microsoft and OpenAI: this is not gossip; it is control

The structural story of this issue is the next phase of the Microsoft-OpenAI partnership. OpenAI published its own note on that next phase The next phase of the Microsoft OpenAI partnership - OpenAI. CNBC framed the change as OpenAI capping revenue-share payments to Microsoft OpenAI shakes up partnership with Microsoft, capping revenue share payments - CNBC. The New York Times used the phrase "loosen their partnership" Microsoft and OpenAI Loosen Their Partnership - nytimes.com. The Wall Street Journal added pressure from another direction: OpenAI reportedly missed key revenue and user targets during its high-stakes IPO sprint OpenAI Misses Key Revenue, User Targets in High-Stakes Sprint Toward IPO - WSJ. Another NYT headline asked whether OpenAI is falling further behind in the AI race Is OpenAI Falling Further Behind in the A.I. Race? - nytimes.com.

That is not "Microsoft versus OpenAI" gossip. It changes the practical landing surface. Who controls cloud deployment? Who owns the enterprise contract? Who has limits on model and product IP? Who carries the compute capex? Those questions eventually show up as procurement risk, cross-cloud flexibility, governance posture, and support reliability.

Layer	What changes for enterprise buyers
Commercial share	A cap on revenue sharing suggests weaker economic coupling and more pressure for OpenAI-owned revenue channels
Cloud deployment	A looser partnership makes multi-cloud and direct enterprise deployment more strategically important
Product control	IPO and growth pressure push OpenAI to package model capability into sellable products faster

This is why the rest of the issue should not be read as isolated announcements. AWS, FedRAMP, PwC, ChatGPT ads, and Codex orchestration are all part of the same control-plane response.

2. OpenAI fills in the control plane: cloud, compliance, workflow, ads

OpenAI moved across several fronts in the same window. None of them is just "one more feature."

First: cloud and enterprise deployment. OpenAI announced that its models, Codex, and Managed Agents are coming to AWS OpenAI models, Codex, and Managed Agents come to AWS - OpenAI. For enterprise teams, that is more important than model availability by itself. AWS is where procurement, IAM, network controls, data governance, and cost controls already live. If OpenAI wants less dependence on one cloud partner, multi-cloud availability is not a nice-to-have; it is table stakes.

Second: government and regulated procurement. OpenAI announced FedRAMP Moderate availability OpenAI available at FedRAMP Moderate - OpenAI. FedRAMP is not a capability benchmark. It is a buying threshold. It means the product can enter a subset of public-sector and regulated-enterprise workflows. That is less flashy than a new model, but harder commercially.

Third: finance workflow. OpenAI and PwC announced a collaboration around the office of the CFO OpenAI and PwC collaborate to reimagine the office of the CFO - OpenAI, and PwC separately described an OpenAI-native finance function PwC and OpenAI Build a First-of-Its-Kind OpenAI Native Finance Function - PwC. CFO workflows are not a natural extension of chat. They require permissions, auditability, data lineage, human review, and integration with ERP, reporting, approvals, and risk controls. The question is not whether the model can draft a finance memo. The question is whether it can sit inside the existing chain of accountability.

Fourth: developer orchestration and infrastructure. OpenAI published Symphony, an open-source spec for Codex orchestration An open-source spec for Codex orchestration: Symphony. - OpenAI, and separately discussed supercomputer networking for large-scale AI training Supercomputer networking to accelerate large scale AI training - OpenAI. The former is toolchain control. The latter is infrastructure control. Together, they show OpenAI filling both layers: workflow description on top, training and inference supply underneath.

Fifth: monetization. OpenAI announced new ways to buy ChatGPT ads New ways to buy ChatGPT ads - OpenAI, alongside ad policies Ad policies - OpenAI. This will be read as an "ads in ChatGPT" controversy, but the operational point is sharper: if ChatGPT becomes a measurable, purchasable demand-generation surface, OpenAI is no longer only selling APIs and subscriptions. That changes product incentives, and it raises new questions around data use, brand safety, and governance.

The shared language across these moves is control. OpenAI needs less reliance on one partner and more ownership of deployment, compliance, workflow, and revenue surfaces.

3. Anthropic's vertical-agent week: enterprise motion, with honest cost and reliability signals

Anthropic's two-week pattern is also clear: move Claude out of general chat and into vertical workflows.

Security was the densest push. Claude Security emerged from closed preview with codebase vulnerability scanning Anthropic's Claude Security emerges from closed preview to scan your codebases for vulnerabilities - The New Stack. SecurityWeek framed it as a response to an AI-powered exploit surge Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge - SecurityWeek, and CRN covered it from an enterprise buying angle Anthropic Launches Claude Security: 5 Things To Know - crn.com. This is a plausible landing zone. Security teams already have triage, scanning, review, and remediation flows. If an agent can attach to repos, tickets, and CI/CD, its value is easier to measure than a general assistant's.

Finance and professional services formed the second line. Anthropic announced agents for financial services Agents for financial services - Anthropic, then announced a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs - Anthropic. Read together, Anthropic is not just selling models to finance. It is trying to wrap models in consulting, compliance, governance, and services channels. Slower, but easier to buy.

Creative work and developer workflow formed the third line. Claude for Creative Work Claude for Creative Work - Anthropic and Claude Code Auto Mode with human approval gates Inside Claude Code Auto Mode: Anthropic’s Autonomous Coding System with Human Approval Gates - infoq.com point to the same product philosophy: let the agent do more, but keep explicit human approval points. That is much closer to what enterprises can actually adopt than "full autonomy." Automation is attractive; auditability, interruptibility, and decision traces are mandatory.

The most useful Anthropic signals, however, were negative. Business Insider reported that Anthropic quietly doubled its estimate for what engineers can expect to spend on Claude Code tokens Anthropic quietly doubles its estimate for how much engineers can expect to spend on Claude Code tokens - Business Insider. Fortune reported that Anthropic attributed Claude Code's monthlong decline to engineering missteps after weeks of user backlash Anthropic says engineering missteps were behind Claude Code’s monthlong decline after weeks of user backlash - Fortune. Those belong next to the launches, not in a footnote.

Anthropic signal	Positive read	Cost that still has to be managed
Claude Security	Security triage can enter real workflow	false positives, remediation ownership, CI integration
Financial-services agents	high-value workflows with budget	compliance, data isolation, audit, human review
Claude Code Auto Mode	stronger automation with approval gates	token cost, reliability, rollback, accountability
Claude Code cost/quality issues	honest signal from real usage	agents still hit latency, cost, and stability limits

My read: Anthropic's enterprise strategy is directionally right, but Claude Code's cost and quality swings are the practical warning. Agents are not "turn it on and save headcount." They are workflow components that need SRE-style treatment: observability, quotas, approval gates, and fallbacks.

4. Google's test: Gemini has to prove it is more than an everywhere button

Google's two-week story is Gemini being pushed into many surfaces. The market question is simple: which ones become workflows, and which ones are just entry points?

The most technically meaningful item is AlphaEvolve. Google DeepMind described it as a Gemini-powered coding agent scaling impact across fields AlphaEvolve: Gemini-powered coding agent scaling impact across fields - Google DeepMind. Read carefully: if it is a research showcase, it is a technical direction; if it enters internal or external engineering workflows, it becomes product. The key questions are not benchmark numbers. Does it attach to issue trackers, repos, CI, and review policy? Who owns the failure mode?

Cars are another high-value surface. GM said it is bringing Google Gemini to millions of vehicles on the road GM brings Google Gemini to millions of vehicles on the road - General Motors, and Google's own blog said cars with Google built in are about to get smarter thanks to Gemini Your car with Google built-in is about to get smarter, thanks to Gemini - blog.google. In cars, the value is not chat. It is navigation, vehicle state, voice control, and service integration. The constraints are hard: latency, offline behavior, privacy, driver distraction, and liability.

Healthcare is more sensitive. Google DeepMind published research on an AI co-clinician AI co-clinician: researching the path toward AI-augmented care - Google DeepMind. This must be labeled research, not product. Clinical workflows require validation, accountability, data governance, and physician fit. A convincing demo is not enough.

The consumer side is a stack of Gemini app expansion: April's Gemini Drop Find out what’s new in the Gemini app in April's Gemini Drop. - blog.google, file generation for Google Docs/PDF/Word Gemini app can now generate Google Docs, PDF, Word, and other files - 9to5Google You can now easily generate files in Gemini. - blog.google, UK personalization features Gemini launches new personalisation features in the UK - blog.google, proactive assistance and new voices reportedly in preparation Gemini app preps ‘Proactive Assistance’ and new Gemini voices - 9to5Google, and hints of usage limits / AI Ultra Lite Google readies ‘AI Ultra Lite’ plan and explicit ‘usage limits’ for Gemini - 9to5Google. The direction is obvious: Google is making Gemini a daily surface. But surface area is not the same as workflow depth. Repeated work requires data, permissions, audit, rollback, and responsibility.

Finally, Business Insider reported that Google is building an AI agent that could answer OpenClaw Google Is Building an AI Agent That Could Be Its Answer to OpenClaw - Business Insider, while 9to5Google found traces of a Gemini Agent positioned as a "24/7 digital partner" Google preps ‘Gemini Agent’ as your ’24/7 digital partner’ - 9to5Google. If it ships, Google will move directly into agent-OS competition. Until then, treat it as a direction signal.

5. Bottom line: model competition is becoming control-plane competition

Put April 25 through May 8 on one board and the main story is not "who won, OpenAI, Anthropic, or Google." The better frame: all three are trying to attach model capability to control planes.

OpenAI's control plane: cloud, public-sector compliance, finance workflow, ads, and agent orchestration.
Anthropic's control plane: security, finance, creative work, and coding agents, with cost and reliability warnings attached.
Google's control plane: existing surfaces — cars, Docs, Gemini app, clinical research, coding agents, and possibly a personal agent.

For engineering decision-makers, the useful takeaways are blunt:

Do not buy model capability alone; inspect deployment control. AWS, FedRAMP, and enterprise-services partnerships are closer to procurement reality than model scores.
Do not treat agents as automatic headcount savings. Claude Code's doubled token-cost estimate and monthlong decline are the counterexample.
Do not confuse an entry point with a workflow. Gemini can appear everywhere and still fail to own repeated work unless it handles data, permissions, audit, rollback, and accountability.
Do not ignore ad surfaces. ChatGPT ads can change OpenAI's product incentives and raise data-use and brand-safety questions.

The sentence to keep: model competition is still there, but the enterprise buying decision will increasingly be shaped by who controls deployment, compliance, cost, workflow, and revenue surfaces.

stance: The 2026-05-08 issue frames AI competition as a shift from model capability to control planes, led by the MS-OpenAI reset and followed by enterprise deployment, vertical agents, and Gemini surfaces.
key_links:
  - https://openai.com/index/next-phase-of-microsoft-partnership/
  - https://openai.com/index/openai-on-aws/
  - https://www.infoq.com/news/2026/05/anthropic-claude-code-auto-mode/
  - https://deepmind.google/blog/alphaevolve-impact/

AI 週報 — 2026-05-08 MS-OpenAI 合作鬆動，AI 競賽轉向控制面

Yang Goufang — Fri, 08 May 2026 01:51:04 +0000

本週一句話摘要： 這兩週最重要的不是哪個模型又強了一點，而是 OpenAI 與 Microsoft 的合作邊界開始重畫；後面的 AWS、FedRAMP、PwC、廣告、Claude 垂直代理、Google Gemini 場景化，都像是同一件事的不同側面：AI 公司正在把「模型能力」改造成「可被企業採購、治理、部署、付費」的完整控制面。

1. OpenAI 和 Microsoft：最結構性的變化不是八卦，是控制權

本期最重要的事件，是 OpenAI 與 Microsoft 的合作進入下一階段。OpenAI 自己發布了「Microsoft OpenAI partnership」下一階段說明 The next phase of the Microsoft OpenAI partnership - OpenAI；CNBC 的標題直接點出 OpenAI 調整與 Microsoft 的合作，並對 revenue share payment 設上限 OpenAI shakes up partnership with Microsoft, capping revenue share payments - CNBC；NYT 用「loosen their partnership」描述這個變化 Microsoft and OpenAI Loosen Their Partnership - nytimes.com；WSJ 則從另一側補了一刀：OpenAI 在衝刺 IPO 的高壓期，錯過部分收入與用戶目標 OpenAI Misses Key Revenue, User Targets in High-Stakes Sprint Toward IPO - WSJ。NYT 另一篇問題更直接：OpenAI 是否正在 AI 競賽中落後 Is OpenAI Falling Further Behind in the A.I. Race? - nytimes.com。

這不是「Microsoft vs OpenAI」的公司八卦。它直接影響工程落地：誰控制雲端部署、誰控制企業合約、誰拿到模型與產品的 IP 上限、誰承擔算力資本支出，最後都會回到客戶能不能穩定採購、能不能跨雲部署、能不能把模型放進既有治理流程。

可以用三層看這個重排：

層級	對企業客戶的實際影響
商業分潤	revenue share 上限代表利益綁定可能下降，OpenAI 需要更多自有收入入口
雲端部署	合作鬆動後，OpenAI 更有動機走多雲與直接企業部署
產品控制	若 IPO 與成長壓力同步上升，OpenAI 會更快把模型能力包成可銷售產品

這也是為什麼本期不能只看單一發布。OpenAI 接下來一串動作都像補位：AWS、FedRAMP、PwC、ChatGPT ads、Codex orchestration，方向很一致。

2. OpenAI 的補位：從模型公司變成部署與收入控制面

OpenAI 在這兩週同時推了幾條線，但每一條都不只是「多一個功能」。

第一條是雲與企業部署。OpenAI 宣布其模型、Codex 與 Managed Agents 進入 AWS OpenAI models, Codex, and Managed Agents come to AWS - OpenAI。對企業來說，這比單純「又支援一個模型」重要：AWS 是既有採購、權限、網路、資料治理與成本控管的主場。OpenAI 若要降低對單一雲端合作夥伴的依賴，多雲入口是必需品，不是加分項。

第二條是政府與合規。OpenAI 宣布達到 FedRAMP Moderate OpenAI available at FedRAMP Moderate - OpenAI。FedRAMP 不是能力 benchmark，而是採購門檻。它代表產品開始能進入一部分公共部門與受管制企業的標準流程。這種進展不會像新模型一樣有展示效果，但對商業化更硬。

第三條是工作流與財務場景。OpenAI 與 PwC 合作重塑 CFO office OpenAI and PwC collaborate to reimagine the office of the CFO - OpenAI，PwC 也發布了「OpenAI Native Finance Function」說明 PwC and OpenAI Build a First-of-Its-Kind OpenAI Native Finance Function - PwC。CFO 場景不是聊天機器人的自然延伸，它要求權限、審計、資料 lineage、人工覆核與系統整合。這裡的真正問題不是模型能不能寫出財務分析，而是它能不能被放進現有 ERP、報表、審批與風控鏈。

第四條是開發者與代理編排。OpenAI 發布 Symphony 這個 Codex orchestration 開源規格 An open-source spec for Codex orchestration: Symphony. - OpenAI，並另外談了大規模訓練的 supercomputer networking Supercomputer networking to accelerate large scale AI training - OpenAI。前者是工具鏈控制，後者是基礎設施控制。把兩件事放一起看，OpenAI 在補的是上下兩層：上層讓 agent workflow 可被描述與編排，下層確保訓練與推理供給能支撐產品節奏。

第五條是收入入口。OpenAI 發布 ChatGPT ads 的新購買方式 New ways to buy ChatGPT ads - OpenAI，也同步有 ad policies Ad policies - OpenAI。這件事很容易被看成「廣告化」爭議，但工程決策者應該看另一個點：如果 ChatGPT 變成可投放、可衡量、可採購的商業入口，OpenAI 的產品就不只賣 API 或訂閱，而是直接碰到 demand generation。這會改變產品優先順序，也會改變企業客戶對資料使用、品牌安全與治理的要求。

這一組動作的共同語言是：OpenAI 需要更少依賴單一夥伴，更多掌握自己的部署、合規、工作流與營收入口。它不是同時做很多事，而是在補「合作鬆動」之後必須自己承擔的控制面。

3. Anthropic 的垂直代理週：進企業現場，也承認成本與可靠性問題

Anthropic 這兩週的節奏也很清楚：把 Claude 從通用聊天推進垂直工作流。

安全場景最密集。Claude Security 從 closed preview 走出來，主打掃描 codebase vulnerability Anthropic's Claude Security emerges from closed preview to scan your codebases for vulnerabilities - The New Stack，SecurityWeek 也以「counter AI-powered exploit surge」描述這個發布 Anthropic Unveils Claude Security to Counter AI-Powered Exploit Surge - SecurityWeek，CRN 做了企業採購角度整理 Anthropic Launches Claude Security: 5 Things To Know - crn.com。這是合理的切入點：安全團隊本來就有大量 triage、掃描、審查與修補流程，agent 若能接在既有 repository、ticket 與 CI/CD 上，落地價值比一般聊天更容易被量化。

金融與專業服務是第二條線。Anthropic 發布 Agents for financial services Agents for financial services - Anthropic，同週又宣布與 Blackstone、Hellman & Friedman、Goldman Sachs 建立新的 enterprise AI services company Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs - Anthropic。如果把這兩件事連起來看，Anthropic 不是只想賣模型給金融業，而是想把模型包進顧問、合規、資料治理與專業服務渠道。這會比較慢，但採購阻力也比較低。

創作與開發者工作流是第三條線。Claude for Creative Work Claude for Creative Work - Anthropic 與 Claude Code Auto Mode 的 human approval gates Inside Claude Code Auto Mode: Anthropic’s Autonomous Coding System with Human Approval Gates - infoq.com 指向同一個產品哲學：讓 agent 做更多，但保留明確的人類批准點。這比「完全自動」更像企業會採購的形態。能自動很吸引人，但能被審計、能被中止、能留下決策痕跡，才是進 production 的必要條件。

但 Anthropic 這週最值得寫的不是漂亮發布，而是兩個負面訊號。Business Insider 報導 Anthropic 悄悄把工程師使用 Claude Code token 成本預估調高到 2 倍 Anthropic quietly doubles its estimate for how much engineers can expect to spend on Claude Code tokens - Business Insider；Fortune 報導 Anthropic 承認工程失誤造成 Claude Code 長達一個月的下降，之前已累積多週使用者反彈 Anthropic says engineering missteps were behind Claude Code’s monthlong decline after weeks of user backlash - Fortune。這兩件事應該被放在發布旁邊看，而不是埋在角落。

Anthropic 訊號	正面解讀	必須面對的成本
Claude Security	安全 triage 可進 workflow	false positive、修補責任、CI 整合成本
Financial services agents	金融業有高價值流程	合規、資料隔離、審計與人工覆核
Claude Code Auto Mode	自動化更強且保留 approval gates	token 成本、可靠性、rollback 與責任歸屬
Claude Code 成本/品質負面訊號	公司願意承認現實問題	agent 仍會被 latency、成本與穩定性卡住

這裡的判斷很直接：Anthropic 的企業策略是對的，但 Claude Code 的成本與品質波動提醒我們，agent 還不是「開了就省人力」的工具。它更像是一個需要 SRE 心態管理的新工作流元件：要觀測、要限額、要 approval gates、要 fallback。

4. Google 的壓力測試：Gemini 要證明自己不是只會被塞進每個入口

Google 這兩週的故事，是把 Gemini 往各種場景放，但市場會追問：哪些是真的 workflow，哪些只是入口展示？

最有技術含量的是 Google DeepMind 的 AlphaEvolve，標題直接說是 Gemini-powered coding agent，並強調 across fields 的 scaling impact AlphaEvolve: Gemini-powered coding agent scaling impact across fields - Google DeepMind。這類發布需要小心讀：如果只是研究展示，它代表技術方向；如果能進內部或外部工程流程，才代表產品化。對讀者最該問的不是 benchmark，而是它接不接 issue tracker、repo、CI、review policy，以及錯誤時誰負責。

車載是另一個高價值場景。GM 宣布把 Google Gemini 帶到路上數百萬台車 GM brings Google Gemini to millions of vehicles on the road - General Motors，Google blog 也說 built-in Google 車輛會因 Gemini 變聰明 Your car with Google built-in is about to get smarter, thanks to Gemini - blog.google。車載 AI 的價值不在閒聊，而在導航、車況、語音控制與服務整合；限制也很硬：latency、離線能力、隱私、駕駛分心與責任歸屬。這是 Gemini 能否離開手機 UI、進入實體產品的一次測試。

醫療則更敏感。Google DeepMind 發布 AI co-clinician 研究 AI co-clinician: researching the path toward AI-augmented care - Google DeepMind。這類題目必須明確標成研究，而不是產品。臨床場景的門檻是驗證、責任、資料治理與醫師 workflow，不是 demo 看起來像醫生。

消費端則是 Gemini app 的功能堆疊：April Gemini Drop Find out what’s new in the Gemini app in April's Gemini Drop. - blog.google、生成 Google Docs/PDF/Word 等檔案 Gemini app can now generate Google Docs, PDF, Word, and other files - 9to5Google You can now easily generate files in Gemini. - blog.google、personalisation features Gemini launches new personalisation features in the UK - blog.google、Proactive Assistance 與新語音準備中 Gemini app preps ‘Proactive Assistance’ and new Gemini voices - 9to5Google、使用限制與 AI Ultra Lite plan Google readies ‘AI Ultra Lite’ plan and explicit ‘usage limits’ for Gemini - 9to5Google。這些都指向同一件事：Google 正在把 Gemini 做成日常入口，但入口多不等於落地深。真正的考驗是使用者會不會把它放進重複工作，而不是偶爾試一次。

最後，Business Insider 報導 Google 正在打造可能回答 OpenClaw 的 AI agent Google Is Building an AI Agent That Could Be Its Answer to OpenClaw - Business Insider，9to5Google 也提到 Gemini Agent 作為「24/7 digital partner」的跡象 Google preps ‘Gemini Agent’ as your ’24/7 digital partner’ - 9to5Google。這一組消息如果成真，Google 會正面進入 agent OS 競爭。但在沒有正式產品前，只能視為方向訊號。

5. 本期結論：AI 公司正在從模型競賽，轉向控制面競賽

把 04-25 到 05-08 的事件放在一起，主線不是「OpenAI、Anthropic、Google 誰贏」。更精準的說法是：三家公司都在把模型能力接到控制面。

OpenAI 的控制面是雲、政府合規、企業財務 workflow、廣告入口與代理編排。
Anthropic 的控制面是安全、金融、創作與 coding agent，但它也被 token 成本與可靠性提醒。
Google 的控制面是既有入口：車、Docs、Gemini app、醫療研究、coding agent 與可能的 personal agent。

對工程決策者，本週最實用的判斷是：

不要只買模型能力，要看部署權。 AWS、FedRAMP、企業服務合作比模型分數更接近採購現場。
不要把 agent 當省人成本承諾。 Claude Code 的 2 倍 token 成本預估與一個月品質下降，是很好的反例。
不要把入口當 workflow。 Google 把 Gemini 放進更多地方，但只有接上資料、權限、審計、回滾與責任鏈，才算真正落地。
不要低估商業入口的影響。 ChatGPT ads 會改變 OpenAI 的產品優先順序，也會帶來資料使用與品牌安全問題。

這期最值得記住的一句話：模型競賽沒有消失，但真正會改變企業採購的，是誰能控制部署、合規、成本、工作流與收入入口。

stance: 2026-05-08 這期的主線是 AI 公司從模型能力競賽轉向控制面競賽；MS-OpenAI 重排是核心，企業部署、垂直代理與 Gemini 場景化都是後續反應。
key_links:
  - https://openai.com/index/next-phase-of-microsoft-partnership/
  - https://openai.com/index/openai-on-aws/
  - https://www.infoq.com/news/2026/05/anthropic-claude-code-auto-mode/
  - https://deepmind.google/blog/alphaevolve-impact/

DEV Community: Yang Goufang

你的 AI agent 不笨，是你餵的 context 不行

為什麼是現在

一、把知識從腦袋（和對話）裡搬進檔案

二、把規則分層，一層只做一件事

三、餵事實，不要餵感覺

四、把 context 當成有限資源

五、告訴 agent 你的環境長怎樣

六、修種子，不要修果子

一點誠實的但書

留一個問題給你

Your AI Agent Isn't Dumb — Your Context Is

Why this matters now

1. Get knowledge out of your head and into files

2. Layer your rules — each doing one job

3. Feed facts, not vibes

4. Treat context as a finite resource

5. Tell the agent about your environments

6. Fix the seed, not the fruit

The honest caveat

One question for you

AI Weekly — 2026-05-22 to 2026-05-29 | Anthropic's $965B Moment and the Infrastructure Bet

The Funding: What $65B Actually Buys

Claude Opus 4.8: Capability vs. Distribution

Google Rewrites the Search Box

The AGI Timeline Returns

The Job Displacement Fault Line

One Number Worth Tracking

Tool, Not Shrine

AI 週報 — 2026-05-22 to 2026-05-29 | 定價權轉移：Anthropic 估值超越 OpenAI 背後的結構訊號

模型與平台：Claude Opus 4.8 登陸 AWS，定價權歸屬出現訊號

制度性擴張：米蘭與首爾，歐亞企業市場的網絡效應正在成型

Altman 的 IPO 問題與 OpenAI 的治理結構風險

Google 搜尋框 25 年首度改版：核心業務的還擊節奏

信仰與模型：Anthropic 的非技術公關戰線

本週橫向對比

結語

AI Weekly — 2026-05-15 to 2026-05-22 | The Agentic Inflection Is Real, But the Enterprise Gap Is Wider Than Ever

Google I/O: Agentic Gemini Ships, Finally

OpenAI: IPO Filing and Internal Realignment

Enterprise AI: Deals, Not Demos

The Chip Wall: Geopolitics as Engineering Constraint

AI 週報 — 2026-05-15 to 2026-05-22 | 當 IPO 傳聞撞上 27 萬人部署規模

OpenAI 邁向公開發行：從研究實驗室到華爾街

Anthropic 企業部署規模式的到來

基礎設施更新：NVIDIA CPU 與 Google I/O 代理軍備

NVIDIA Vera：首款專為代理工作負載設計的 CPU

Google Gemini I/O 更新：主動式 24/7 代理

晶片競合動態：Amazon 逼近、NVIDIA 仍居核心

一週數據掃描

Meta GEM 推薦基石模型深剖：大廠限定的工程地獄與落地取捨

Meta GEM 推薦基石模型深剖：大廠限定的工程地獄與落地取捨

一、 架構設計：解決稀疏性，但代價是推理延遲（Inference Latency）

💡 落地判斷：

二、 知識傳遞（Knowledge Transfer）：解決監督過期（Stale Supervision）的妙招

💡 落地判斷：

三、 訓練基礎設施：重度定製的軟硬體協同（一般團隊玩不起）

💡 落地判斷：

四、 YangGF 的落地判斷 (Landing Assessment)

🛠️ 給架構師的落地建議：

🤔 讀者交流時間

教宗良十四世首道通諭聚焦 AI：當教會成為房間裡的成年人

一句話摘要

為什麼這件事值得關注

教宗良十四世的立場輪廓

美國視角：與 Trump 政府的張力

來自學界的兩段引述（值得記住的框架）

教會 AI 倫理路徑的時間線（個人整理）

個人評論：通諭會比監管文件更有影響力嗎？

AI Weekly — 2026-05-08 to 2026-05-15 | OpenAI the Consultant, Anthropic the Platform — Model Companies Pivot Collectively

Model Companies Pivot Collectively: From API Sales to Institutional Resources

Apple × OpenAI Rift: The Dissipation of the Integration Dividend

Anthropic's Long-Term Play: The 2028 Scenarios and the Gates Partnership

Codex Windows Sandbox: Security Is Not a Feature, It Is a Cost

Regulatory Pressure: The Legal Vector Is Accelerating

An Underappreciated Technical Development: Indicator-Based UI Experiments

AI 週報 — 2026-05-08 to 2026-05-15 | OpenAI 做顧問、Anthropic 做生態，模型公司集體轉向

模型公司集體轉向：從 API 銷售到制度性資源

Apple × OpenAI 裂痕：整合紅利的消散

Anthropic 的長期策略：2028 場景與 Gates 合作

一、架構設計：解決稀疏性，但代價是推理延遲（Inference Latency）

二、知識傳遞（Knowledge Transfer）：解決監督過期（Stale Supervision）的妙招

三、訓練基礎設施：重度定製的軟硬體協同（一般團隊玩不起）