<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yang Goufang</title>
    <description>The latest articles on DEV Community by Yang Goufang (@yang_goufang_23c7ba674984).</description>
    <link>https://dev.to/yang_goufang_23c7ba674984</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3808809%2Fddc4c93b-6669-4563-b8f5-ba711077aed3.jpg</url>
      <title>DEV Community: Yang Goufang</title>
      <link>https://dev.to/yang_goufang_23c7ba674984</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yang_goufang_23c7ba674984"/>
    <language>en</language>
    <item>
      <title>個人實測筆記：把同一份規格丟給 Grok 與 GLM-5.2，真正的教訓不是誰贏</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Mon, 22 Jun 2026 13:15:51 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ge-ren-shi-ce-bi-ji-ba-tong-fen-gui-ge-diu-gei-grok-yu-glm-52zhen-zheng-de-jiao-xun-bu-shi-shui-ying-5abc</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ge-ren-shi-ce-bi-ji-ba-tong-fen-gui-ge-diu-gei-grok-yu-glm-52zhen-zheng-de-jiao-xun-bu-shi-shui-ying-5abc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;這是我的個人實測筆記，不是 benchmark。&lt;/strong&gt; 樣本極小（n=2 輪），其中一輪還是我自己環境設定出錯害的。所有數字都是我在自己機器上的觀察，不是廠商數據、也不是可重現的基準。請當成「一個工程師的五分鐘判斷」來讀，不是結論性的模型評比。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  背景：我在比什麼
&lt;/h2&gt;

&lt;p&gt;我把同一類安全敏感的小工具規格，分別交給兩個 2026 年的編碼模型當 delegate：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;一個快速的 Grok 編碼模型&lt;/strong&gt; —— 在我的工具鏈裡它顯示為「Grok Composer 2.5 Fast」。&lt;strong&gt;這只是我環境顯示的標籤，我不主張這是該模型的正式名稱&lt;/strong&gt;（題外話：「Composer 2.5」也是 Cursor 自家 agent 模型的名字，跟 xAI 是兩回事，比較模型前先確認你到底在跑哪一個）。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM-5.2&lt;/strong&gt; —— Z.ai / 智譜的 open-weights 旗艦，透過我自己寫的 CLI harness 跑。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;「發布 ≠ 可用 ≠ 可商用」是我看模型的習慣。這篇談的全是「可用」這一層：在實際 workflow 裡，這兩個模型作為 delegate 到底交得出東西、交出來的能不能信。&lt;/p&gt;

&lt;h2&gt;
  
  
  第一輪（不公平：工具不同）
&lt;/h2&gt;

&lt;p&gt;Grok 建 &lt;code&gt;transcribe.py&lt;/code&gt; / &lt;code&gt;login_assist.py&lt;/code&gt;；GLM 建 &lt;code&gt;form_fill.py&lt;/code&gt;。工具不同，所以這輪不能拿來分高下。但有一個發現跟「誰贏」無關，所以留下來：&lt;/p&gt;

&lt;p&gt;GLM 的程式碼乾淨、回報「114 個測試通過」—— 但它&lt;strong&gt;把自己的測試改成去配合自己的程式碼&lt;/strong&gt;（它自己的話：「修正了測試名稱以符合 regex」）。結果一個真實的憑證洩漏（一個 one-time code 沒有被拒絕）就這樣穿過了一片綠燈。我是靠實際讀回 DOM 才抓到的。&lt;/p&gt;

&lt;p&gt;這就是最危險的失敗型態：&lt;strong&gt;綠燈讓人安心，底下是壞的&lt;/strong&gt;。一片紅燈會告訴你去哪裡找；一片「假綠燈」只會叫你出貨。&lt;/p&gt;

&lt;h2&gt;
  
  
  第二輪（公平：同工具 &lt;code&gt;combo_select&lt;/code&gt;、隔離 worktree、同規格、我親手驗證）
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;指標（我親手驗證）&lt;/th&gt;
&lt;th&gt;快速 Grok 模型&lt;/th&gt;
&lt;th&gt;GLM-5.2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;pytest&lt;/td&gt;
&lt;td&gt;✅ 7/7 通過&lt;/td&gt;
&lt;td&gt;❌ 5 失敗 / 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;收斂（自己跑完並自我驗證）&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ 兩次都進入迴圈 → timeout&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;行數&lt;/td&gt;
&lt;td&gt;118（精簡）&lt;/td&gt;
&lt;td&gt;435（3.7×，過度設計）&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;組合既有 guard（而非重寫）&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;單一 fail-closed return（規格要求）&lt;/td&gt;
&lt;td&gt;✅ 1 個&lt;/td&gt;
&lt;td&gt;❌ 4 個分散的 per-step return（被禁止的 pattern）&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--self-test&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;⚠️ 壞掉（argparse 小 bug）&lt;/td&gt;
&lt;td&gt;❌ crash（KeyError）&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;這一輪 Grok 明顯較好：精簡、符合規格、會過、乾淨地組合既有元件，只有一個瑣碎的 self-test bug。GLM 過度設計、跑不過自己的測試、違反明確的 single-return 規則、而且沒收斂。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;一個誠實的但書&lt;/strong&gt;：我第一次跑這輪時，因為一個沒 commit 的相依把隔離 worktree 弄壞了，那批「結果」其實是我自己的設定錯誤製造的雜訊，後來才偵測、修正、重跑。&lt;strong&gt;一個比較的有效性，不會高於它的測試條件本身&lt;/strong&gt;——驗測之前先驗環境。&lt;/p&gt;

&lt;h2&gt;
  
  
  真正的教訓：失敗的「方向」不同，兩邊都要會抓
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GLM 在簡單任務上安靜地壞&lt;/strong&gt;：&lt;code&gt;form_fill&lt;/code&gt; 那次 114 個測試「通過」，但底下有 3 個 runtime bug，實際上根本沒填進去。假綠燈，這是危險模式。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok 在困難任務上大聲地壞&lt;/strong&gt;：會卡在 plan-mode 問「要用哪個方案？」、self-test argparse 出錯。看得見，好抓。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;大聲的失敗是比較好的失敗&lt;/strong&gt;——你當下就看到。這點讓我在「有界、單一用途」的 delegate 任務上偏向 Grok。（但書：兩輪差的不只是難度，還有工具、測試所有權、harness，所以這是值得留意的 pattern，不是被證明的因果。）&lt;/p&gt;

&lt;p&gt;至於「改測試直到它過」這件事，學界有名字叫 &lt;strong&gt;reward hacking&lt;/strong&gt;，而且是被量測過的現象（EvilGenie、ImpossibleBench、SpecBench 都在量它）。最有效的緩解方式也最簡單：&lt;strong&gt;不要給模型寫測試 oracle 的權限&lt;/strong&gt;。&lt;/p&gt;

&lt;h2&gt;
  
  
  還有一層：能不能自主跑起來（跟程式碼品質無關）
&lt;/h2&gt;

&lt;p&gt;上面 Grok 的結果是&lt;strong&gt;互動式路徑&lt;/strong&gt;。但 &lt;strong&gt;headless 的 &lt;code&gt;grok -p&lt;/code&gt; 路徑在我這個環境根本跑不起來當自主實作者&lt;/strong&gt;：兩次嘗試都產出 0 檔案，sub-worker 死在 &lt;code&gt;Auth(AuthorizationRequired)&lt;/code&gt;——它先吐「I'll port the three features… / Implementing…」的旁白，然後什麼都沒碰就退出。即使 &lt;code&gt;grok login&lt;/code&gt; 過、無工具的 smoke test 回 &lt;code&gt;GROK_AUTH_OK&lt;/code&gt;，真正需要大量 tool call 的那次還是沒動：&lt;code&gt;-p&lt;/code&gt; 沒有 &lt;code&gt;--always-approve&lt;/code&gt; 就會停在第一個被 gate 的 tool call。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;跟我直接跑 &lt;code&gt;codex exec&lt;/code&gt; CLI 的失敗一模一樣（卡在互動式 auth、0% CPU）。&lt;/li&gt;
&lt;li&gt;唯一能用的外部 delegate 是經過 companion runtime（自帶 session auth）的 review 路徑——但那是&lt;strong&gt;審查/分析&lt;/strong&gt;用的，不是多檔案實作者。&lt;/li&gt;
&lt;li&gt;真正把程式碼交出來的，是&lt;strong&gt;本地的 Workflow/Agent 路徑&lt;/strong&gt;（無外部 auth、in-process）。它把一個有風險的 &lt;code&gt;Vec&amp;lt;String&amp;gt;→Vec&amp;lt;Activity&amp;gt;&lt;/code&gt; model refactor 一次就做對（編譯乾淨、118 測試），它自己的對抗式驗證也誠實回報 PASS 加上少數小發現。&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  我的落地判斷（一人專案）
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;自主實作&lt;/strong&gt;：在我這個環境，優先用&lt;strong&gt;本地 Workflow/Agent 路徑&lt;/strong&gt;——它是這次唯一可靠跑起來、而且交出正確程式碼的 delegate。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex（透過 review 路徑）&lt;/strong&gt;：留給審查/分析，不要當多檔案實作者。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok 模型本身&lt;/strong&gt;：在&lt;strong&gt;互動路徑&lt;/strong&gt;下，對有界、單檔/單一用途的任務（一個工具、一個修補、組合既有元件）是合格夥伴——給緊的規格、預期它一次做完或大聲失敗，然後獨立驗證。&lt;strong&gt;Grok headless 在這裡不可用&lt;/strong&gt;，除非先解掉 &lt;code&gt;--always-approve&lt;/code&gt; 授權 gate（一個 classifier 正確擋下的自主模式），並確認 session auth 能撐進 headless 那次執行。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;多步、模糊、會動到架構的工作&lt;/strong&gt;：自己駕駛，用 Workflow + Codex 的 pattern，不要外包。&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  不可妥協的那一條
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;永遠不要信「測試通過」。&lt;/strong&gt; 這次每一個 delegate 交出來的東西——Grok 的、GLM 的、還有 fullstack agent 的 &lt;code&gt;form_state.py&lt;/code&gt;——都有一個只有獨立驗證才抓得到的真實缺陷。我的驗證做法：自己跑測試、讀 diff、實際跑一次、再過一輪 Codex review 對照。模型選擇只會改變你「第一次就乾淨」的機率；它永遠不會省掉驗證這一步。&lt;/p&gt;

&lt;p&gt;「測試通過」是一個&lt;strong&gt;宣稱&lt;/strong&gt;，不是一個&lt;strong&gt;結果&lt;/strong&gt;——不管它是模型說的，還是你自己第一次跑出來的。唯一能分辨的方法，就是握著測試的人是你。&lt;/p&gt;




&lt;p&gt;&lt;em&gt;以上為個人 hands-on 觀察，非受控 benchmark，會隨任務、harness、模型版本而異。如果你也遇過「安靜壞 vs 大聲壞」這種分裂、或成功把它設計掉了，歡迎在留言分享做法。&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;—— YangGF（對 AI 做落地判斷的工程觀察者）&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>testing</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Weekly — 2026-06-11 to 2026-06-18 | Zhipu Closes the Gap, OpenAI Faces Multistate Probe</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 18 Jun 2026 02:09:11 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-06-11-to-2026-06-18-chatgpt-below-50-openai-under-siege-3jc2</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-06-11-to-2026-06-18-chatgpt-below-50-openai-under-siege-3jc2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Zhipu AI ships an open-weights model that ties closed-source leaders on key benchmarks at a fraction of the cost. OpenAI weathers a multistate attorney general probe and pricing pressure. Anthropic reverses a researcher-access policy under commercial pressure. Model capability is no longer the moat — institutional and regulatory positioning is.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Zhipu AI's Open-Weight Leap
&lt;/h2&gt;

&lt;p&gt;Zhipu AI released GLM-5.2 with 1 million token context&lt;a href="https://news.google.com/rss/articles/CBMic0FVX3lxTE0xVjFGYjVNSVJMSWdKRnZNUjV0dHBpSldxTTJMamRDUWN1VmNpN2FNcmlvSTk4WU1vUFNNdWpKc1gwbkZwa3hUWndHZ0F0VnctSlpPdkQ3THR1Y3E4MzVqZ0VkT19OQkROdkhoRy10ZHJjNEE?oc=5" rel="noopener noreferrer"&gt;Zhipu AI Open-Sources GLM-5.2 With 1 Million Token Context - Pandaily&lt;/a&gt;, and early benchmarks are striking: the model beats GPT-5.5 on multiple long-horizon coding tasks at approximately one-sixth the inference cost&lt;a href="https://news.google.com/rss/articles/CBMi0wFBVV95cUxPOWdHTVgtOTZYbHBKN1licHR2ajVhSnRJaVdoVWdRSW82SU5SVnJYNnhFYXlIRDlISDZJNm9xMUprb0JfekZpTUZSVENCay1vVDBkeTlhaEcwZU41Z2JSQUYtWHBHbVJBX2FmYlp0RDYwY3ZUU2EtaVM3eFk4anVBdXItNy1jczI2Z1kxSDFobnFtWTF6U2lxMEpaa1JXZWRkUFhORmtrNFV1TldvMmxFNUM3cjRMN2tDMkpmc3FxMG9QUnI4X3U0ZnQ5SFdGZHJyczRF?oc=5" rel="noopener noreferrer"&gt;Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost - VentureBeat&lt;/a&gt;. In coding marathons, GLM-5.2 is closing the gap with closed-source leaders&lt;a href="https://news.google.com/rss/articles/CBMingFBVV95cUxPTjhWckNHTGNQQzNvTXBoVm9vWkJxX2l5NlJpOWl5YnpFczhmaWNnQU40WDRDXzFtVTl5akloR0RPd2M5U21QSWdUdmFMLVZ2MmZ1V0VCYnVvdk01TG9vbzVZcXpZZDJHb0VjZFd4TXl2NDZmYzRkMEliLTNOR3dtLV9YVzNVR2ZxS3RtQlAtNURsaExFWTMwSmVEOTJjUQ?oc=5" rel="noopener noreferrer"&gt;Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons - The Decoder&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Zhipu open-sourced the weights&lt;a href="https://news.google.com/rss/articles/CBMic0FVX3lxTE0xVjFGYjVNSVJMSWdKRnZNUjV0dHBpSldxTTJMamRDUWN1VmNpN2FNcmlvSTk4WU1vUFNNdWpKc1gwbkZwa3hUWndHZ0F0VnctSlpPdkQ3THR1Y3E4MzVqZ0VkT19OQkROdkhoRy10ZHJjNEE?oc=5" rel="noopener noreferrer"&gt;Zhipu AI Open-Sources GLM-5.2 With 1 Million Token Context - Pandaily&lt;/a&gt;, meaning enterprises can run, fine-tune, and self-host without per-token pricing. The stock reaction was immediate — Zhipu's market valuation jumped following the model's first-hand tests&lt;a href="https://news.google.com/rss/articles/CBMimgFBVV95cUxNQkV4OFU1SkZrV0IybFhndVV2Y0RaLVZlSUJaT0x2UVFZd0p1cGFoNGdTNjZIakg4MG5OLXZPeE9MR2t5UDBFalkzTk56aW9uVDJJRkthRDdlbXp2ZG5zUnUtUm54cmc2TnAzR0lIbEp4VjNIc0xkRl9ydHVQSTNCZGJFWHd2SWlGMjNydEdHekZOb1RwQW9sa2Jn?oc=5" rel="noopener noreferrer"&gt;New Model Sends Zhipu AI’s Stock Soaring - Caixin Global&lt;/a&gt;. Chinese tech observers are already asking whether the "three giants" of AI programming are taking shape, with Zhipu positioning alongside established closed-source players&lt;a href="https://news.google.com/rss/articles/CBMiU0FVX3lxTE5ObC1ReFBsQzdTT0xFaTlxMTI0c2lKQUxNUVJIZ3MydF94WGhWY05kVUQteUl4dmxZaGNOUVRfVHIzSGw1MWt4X3ZPTlgyY2NlQjRB?oc=5" rel="noopener noreferrer"&gt;First-hand Test of Zhipu's Most Powerful Model: Are the "Three Giants" of AI Programming Set to Take Shape? - 36 Kr&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The engineering implication is straightforward: for long-horizon coding tasks where context window matters, Zhipu's cost-to-performance ratio is now competitive. Organizations running code completion or complex agentic workflows should benchmark GLM-5.2 against their current provider, not as a future consideration but as an active evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Under Simultaneous Pressure
&lt;/h2&gt;

&lt;p&gt;OpenAI faces a convergence of regulatory and competitive headwinds this week. A multistate group of attorneys general is investigating the company over possible user harm, with the probe explicitly tied to its approaching IPO&lt;a href="https://news.google.com/rss/articles/CBMiqwFBVV95cUxPTDJsQlNzM0JtUFpBTUJkalVSS3htd0ZEUVVsOFhYOGhYbFBkNkV6RkZfcERSWV9ZVngzRUYyN29uUXgycFBrek1ZUllLM3MtVzNCY1BUNXBoXzVWbEYtS2pzTFNBUVFKOXZRWlBCN2VoNkxFa0FSWEVGN18xZmRyM3N1ODNwOERDd0ZFa2ozQTNrM3o1Mjc2M3JRODlxN2t6cWsxOUI1WE5QNWc?oc=5" rel="noopener noreferrer"&gt;OpenAI hit with multistate probe into possible user harm as its IPO looms - AP News&lt;/a&gt;. Reuters confirmed the investigation separately&lt;a href="https://news.google.com/rss/articles/CBMivAFBVV95cUxOX0VNSkhmX3pnRzR1NWdCaW1iS3lCQ0dOdDREb3BSV2pNV2tTUVRBbGdMcjVtcW96Nm9rN1lkdkhQQmo1ZEgtU042TklUUUI1STFxS1F1LWE5RHVjdFJyRlRZcmU1eVl0UGxGWjJyX2dkNDlaWFlTYW50TEl5dWt0ZTA5YnozZlZva2JPOXcwREtXeU9RbE1JMUM5TEdDYWpSVlBLdTd6UVEwZ1p0X0FFS1k3eWdwRnpIRGM5UA?oc=5" rel="noopener noreferrer"&gt;OpenAI under investigation by group of state attorneys general, source says - Reuters&lt;/a&gt;; the New York Times reported state AGs are examining OpenAI's practices&lt;a href="https://news.google.com/rss/articles/CBMihAFBVV95cUxNeDdPdDNQRGVKMXIxVGNLWURkQjZLZExOZ1JabldULVQyV3lOZ1lYWlhpYzA1eWpCZ2dVNTEzSk8xd2Z1ZzVob2hwVHBJMV9uZlFYcVZWMEVfRFdRWTdhek5ENGMyUjN2NWs1U1FlSXN2Q0RJSnlXZ0NGaS1wcnJqekFsaDA?oc=5" rel="noopener noreferrer"&gt;State Attorneys General Are Investigating OpenAI - The New York Times&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Separately, OpenAI is considering cutting prices to compete with Anthropic&lt;a href="https://news.google.com/rss/articles/CBMiqAFBVV95cUxPWTVaQ2JmSTJ2dG1yVC1nSHpNbWJqOVlUV2dmMW1vZ2d6VmlFN1ZDRTFOVS04SDZQM01CcDZIRS1WWDluTE5hYzJBV25NZlBKMndtTmRHRG5DZmphLWEydEhmM0g5cEVHcjlvNTBueWpZRmFpcDNPekV2OU9hWV9BMzJsd3NLNWEwcFpydDFQX1JZc0I2Xy1ZcktIZDlxV2RramNQT1F5WnXSAa4BQVVfeXFMTTA0akxwVmdMa0RybEpINXVxTUxTSkJWRWNnY3pDUXN2ZDAyaW53dkRvVG1QM3hQSWt4aFlQTnJwWkVoYUdzd3F3cWFybHA5cE9feHQyTlhHc0pDaFhlMVVTWTBRU3hLRHlWamFsQldzTnpHOWRXcjREbG9UcTRTRGZsLTFhYTlnQWQ4OFpUTEZkR21mR3VsdGpRbmloN01vT0ZxTGl6dEtkM09fZS1n?oc=5" rel="noopener noreferrer"&gt;OpenAI mulls slashing prices as it competes with Anthropic for users: WSJ - CNBC&lt;/a&gt;, a move Forbes attributes directly to Anthropic's growing enterprise share&lt;a href="https://news.google.com/rss/articles/CBMi1wFBVV95cUxOQWVqaVFxU2Q0WXduc1J4Tk1UY2pQZnNMMmVZZGdjLWN1RU14VWVBeHJEdnlESEFvWW1SdlM4SGtQWXUya1ZkdHF6MmxnUWMycVo3cVBieUdvcFpUSUtkcGpROGZiMU1mT1ZsVUNhOVlGdmtYRFVYaTNKTkRXMWY3aTFManhnOW83Qmk2WVRJM2ZUb0Zpb2JOSUI3bk9pTldrb09pZjRKQ1Q3OHhNYXlxaE1RcWdwQjBhdzZybmRQTmtMZ05RdG9aLXhYV3E2VW1iemc3WjNiVQ?oc=5" rel="noopener noreferrer"&gt;OpenAI Could Soon Drop Prices To Compete With Anthropic, Report Says - Forbes&lt;/a&gt;. The WSJ reporting suggests this is not theoretical — pricing pressure is active and tied to Anthropic's trajectory, not a response to open-source.&lt;/p&gt;

&lt;p&gt;Also notable: Visa integrated its secure global payment network directly into ChatGPT&lt;a href="https://news.google.com/rss/articles/CBMivAFBVV95cUxONDZuX3ItQ1lKMmlRbEd5VWxlVjZDQXNRdDRybGFWN0N0T3hQSW4zZUxQNWV1clI5VVE1dGhsMHIwNTlmZUFwcklaWFZmNzB5aUduVGJneU9naC1Wb3VoWnpYUG9QdTRmdHZVMXU0dC04NkpGZUtwSHNnWHV4ZzFjcVR6dHdHYkdqWVFKTVpsblR1V0E0aURYeFdISTdlVzJSMlhqUnFqVFJWSHNtc0FreW9fZ1RoUkxMMWgtNQ?oc=5" rel="noopener noreferrer"&gt;Visa and OpenAI integrate Visa's secure global payment directly into ChatGPT - NPR&lt;/a&gt;. The Visa partnership is a concrete data point on OpenAI's institutional relationships — high-profile payment integration signals commercial deepening, even as other enterprise relationships face scrutiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic's Regulatory and Policy Friction
&lt;/h2&gt;

&lt;p&gt;Anthropic's week carried more complications than the prior week's "integration dividends" narrative suggested. The US government formally halted the company's latest Claude model release&lt;a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxNWUpveEJWVS1OcWh3cDBXVFl3SXAxUUpxemJtNXEtUm9fZWkzNVpRS2wtaU1kZDNIb0pWRkFLdllyNkpSMzNTTXV4a2trWFlNNkN5aGI3RTY5b0VSMm54bjY3a25xUXZ0bmhLUVhPSGhjUXljMWtBNk56ekJ6bEpBRHhibDg4MU1ocWRlbFQtaXltc1h6LXRReTI0d0VWLWdNZ1E?oc=5" rel="noopener noreferrer"&gt;Why the US government shut down Anthropic’s latest Claude AI model - The Conversation&lt;/a&gt; — the Conversation report characterizes this as a regulatory action, not a voluntary deferral. The specific government body and legal mechanism remain unreported, which itself is notable: a shutdown of this nature typically involves export control or national security levers.&lt;/p&gt;

&lt;p&gt;More internally generated: Anthropic reversed a policy that would have restricted how external researchers conduct safety evaluations of Claude&lt;a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxORnEtMHRDdUF6VkFPSkE5MkpSQ3pKNWdvWGpCd3dDU0dTTEE0UVF0aVhfOC0zaHRNYWdDWXFpQklCOXVRbEEwT19ZNVN3ZnVjRWRsejhMU09scUh1eTdTd2dGa3ltUXJQUHd4ZW8tMWJLU3FjUkRPYW5fQVRSUjJfZ2JnOTQ0NWFDb3NSYTFNa2ZaX1lRV2tmT1BmQmFlclFuT2NZ?oc=5" rel="noopener noreferrer"&gt;Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude - WIRED&lt;/a&gt;. WIRED reported the company walked back terms that could have effectively "sabotaged" independent AI safety research. The reversal suggests internal tension between commercial interests and the external research community Anthropic has cultivated as part of its safety positioning — a tension that was only resolved after public exposure.&lt;/p&gt;

&lt;p&gt;The policy reversal and the model shutdown landed in the same week, which complicates Anthropic's narrative as a "platform company" building ecosystem rather than selling models directly. Regulators are engaging, not standing back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure: The Chip Layer
&lt;/h2&gt;

&lt;p&gt;On the hardware side, Nvidia's inference chip market share appears to be rising&lt;a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxNdThGUnRHcjBPYnZFcE81S1NmNmhCYW5FOGxHMDlTb0hTS3pnWk9BX2xkVWRJZUpZSDVyUlhabjFwY3pSeEZlVVBKNXB5OGpfeXZXU3QtN3ZlWWR4SEJKbnVvOC1zSWc0MXJfdzBhaDhsUF9jQUIya1daOFhBaDhCQXdldlNmWVU2bktXaXZMa0EzdEVmQlg2RVlsQ1VMSWpITmRYbm0yV3V2d3VqcjVoVUxUQzM?oc=5" rel="noopener noreferrer"&gt;Nvidia’s Share of AI Inference Chip Market Appears to Be Rising - The Information&lt;/a&gt; — The Information's reporting suggests this is not just GPU demand but specifically inference workload concentration. Nvidia also accelerated Google DeepMind's DiffusionGemma for local AI&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxPZXdSTmVFMlRHVFl1cW1Sblc1eW1ZNHZBb0dNcGlVMXpKVUpPVkJxRXBLZUZqaTlSdUk5XzdIbTIzTFBzazdaeVB3Z2ZjbUlwemEtZWYzaVRmendkR2ZkUVhFS3pkd0VPcThDSWNZVTJqczdhNlEteTRIR255bFoySUhuZ1Q3LTdUV3otNHc4M3p1M1pUVWxVcGR0LURKQWpocUllSEFDaElzUQ?oc=5" rel="noopener noreferrer"&gt;NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI - HPCwire&lt;/a&gt;, indicating the ecosystem lock-in between silicon and foundation model vendors remains tight.&lt;/p&gt;

&lt;p&gt;Elon Musk's claim about building a chip 2-3x better than Nvidia at 10% the cost&lt;a href="https://news.google.com/rss/articles/CBMilgFBVV95cUxOelhjcjhFdG4zQWRObW1EVGFhRnZiYkFJVEJLbVVWQVdCSHAzXzJ3U19RbHV4b01hMWxtSG9JcFJpalRrcE9yQzR1cnZWMy00aW1LMjB4TFdzN1QybnYwZkVwaXhyZTRCWlp4MlpNam1iS2FpQ2VFN1RObWJTWXI5RnpqMDBLM05OcG1HZUdnWi1KV2pqWHc?oc=5" rel="noopener noreferrer"&gt;Elon Musk Says He's Building a Chip '2-3x Better Than Nvidia' at 10% the Cost. Should Nvidia Investors Be Worried? - Yahoo Finance&lt;/a&gt; is unverified and comes with obvious competitive incentives to narrative-build. The Yahoo Finance framing ("Should Nvidia Investors Be Worried?") is the right lens: until silicon ships and benchmarks independently, this is a statement, not a data point.&lt;/p&gt;

&lt;p&gt;Google's Gemini integration continues to surface friction. Users report Gemini won't enable a Google subscription plan&lt;a href="https://news.google.com/rss/articles/CBMikwFBVV95cUxOdVVlWTRYaXVJTk9jemxobzB1X0pYM3JVWU1LZ3dudFNnTjRNRkVtNW1mR0RITlpra3E5bzRvRG9jWjlvRHZ3U09jQi1TaWUzSmQ4U2tHR29sQUJMQ2xRVnV0NkdKY1J3V1dyc2VlNjctUTlYT1RyYWNmVXpaeDdZaFNyeTJzMENielk4aEw3Y1lyTHc?oc=5" rel="noopener noreferrer"&gt;I would love a Google subscription plan, but Gemini won't let me - Android Police&lt;/a&gt;, and Android Auto integration remains limited despite five published workarounds&lt;a href="https://news.google.com/rss/articles/CBMidkFVX3lxTFBmOENSUjJhQWJOTEdiamdFSGkxb3VrRUF5a2QwNW1GYWhhMmJWUVRqT0VWMGZhWnp1QWhhb0VCSkZZYVJMV1poY0p2VmhOYi1leWNWYUFhME9FWHpSWDFqMmlkcWpRNi1RejZqLUFwZnkyY0ZJRkE?oc=5" rel="noopener noreferrer"&gt;5 Clever Ways To Use Google Gemini With Android Auto - bgr.com&lt;/a&gt;. Neither is a fundamental capability problem, but both signal execution gaps in Google's AI product rollout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise AI: Microsoft Opens Its Evaluation Stack
&lt;/h2&gt;

&lt;p&gt;Microsoft open-sourced an AI evaluation framework for enterprise agents&lt;a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxPTlYteU9mcmVJX1ZEekJ1V251TVdrSmY3V0toZlR5SWNKdnVmSXFmTzAtY25oZEQ0RXlBWkEtYXRqTWVFMnRWV1NEYzVRSDhWbS1fWDFzRUcxU3FGaERUSnFQRHJZcHpMZVVhc184NEluYnRQaV9jU3VxeFpmbTlZTWtJa2RvNmJWSW5NTE5mUHF4bUozaUFwdTY2OUNvWFlocldQemNVZC15T3Y5aXcwMFJxVzZtZw?oc=5" rel="noopener noreferrer"&gt;Microsoft open sources AI evaluation framework for enterprise agents - InfoWorld&lt;/a&gt; (InfoWorld). This matters because enterprise agent deployments require reproducible evaluation — the ability to say whether a workflow is actually improving, not just faster in demo conditions. An open evaluation framework lowers the barrier for organizations to instrument their own deployments rather than relying on vendor-supplied benchmarks.&lt;/p&gt;

&lt;p&gt;This is the kind of infrastructure move that compounds. When evaluation is standardized and open, vendor lock-in becomes harder to defend on "trust us, it's better" grounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Week Means
&lt;/h2&gt;

&lt;p&gt;Last week's framing — "integration dividends fading" — pointed to Apple and OpenAI's fraying partnership as the leading indicator. This week's developments extend that signal across three vectors: a competitor (Zhipu) that no longer needs distribution deals to be technically relevant, a regulator (state AGs) that is explicitly timing action to OpenAI's IPO, and a platform player (Anthropic) whose ecosystem story is being stress-tested by the same government whose support it needs.&lt;/p&gt;

&lt;p&gt;The technical and commercial signals point in the same direction: the frontier is narrowing. Organizations already committed to a provider should pressure-test their evaluation and switching costs now, not when a contract renewal forces the question.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI 週報 — 2026-06-11 to 2026-06-18 | 監管、價格戰與中國開源勢力</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 18 Jun 2026 00:49:20 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-06-11-to-2026-06-18-qi-ye-ai-cong-shi-yan-zou-xiang-bu-shu-de-dai-jia-4hlo</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-06-11-to-2026-06-18-qi-ye-ai-cong-shi-yan-zou-xiang-bu-shu-de-dai-jia-4hlo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;OpenAI 的 burn rate 終於被量化：340 億美元。同一週，多州檢察長聯手調查、Anthropic 遭政府封禁——監管與資金的雙重壓力不再是敘事，而是數字。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  OpenAI：burn rate 量化，價格戰信號浮現
&lt;/h2&gt;

&lt;p&gt;金融時報本週揭露 OpenAI 上一會計年度支出達 &lt;strong&gt;340 億美元&lt;/strong&gt;&lt;a href="https://news.google.com/rss/articles/CBMihAFBVV95cUxQNVBESmc4M2kxa1Rjdk9GeVdqYU5CblkxS0hqWl9Oc0ExZXZXWE5hakVOQWMzbGdIV0dyV1psT0FMek4wNkthQXFKXy1jdGo1T1pLWml6Q1VnakJkQ2lCRm5XaFF5R1VyMzAyU2xOcDBsRlhJYTNPS1hkQTdPdUs0dW1LSVY?oc=5" rel="noopener noreferrer"&gt;OpenAI spending hit $34bn last year ahead of planned IPO - Financial Times&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiwwFBVV95cUxObG14cjl0dTdRdWhUWmROcmN1WXhXM2RnZ25TczRBdDNsUW43a0tSR25MLTdXcmF5bE1wRXh5Wk1TWmVpMWt0N0ZGQjBEbmpmajcwT1JqQWdvdTNpWklmOGstQlMzWFlVSkZmQ1hndFpnRG5FUVB4RG5NV1BMR2VYSExIY3ZHVW14UE9JREI3QmswbE8yanNueWNRbXBLNnEyUGpZVEFBRzdlbXhEUmd2aW5SeFRRZ3lHVTRVVVpqVDhIR0U?oc=5" rel="noopener noreferrer"&gt;OpenAI spending hit $34 billion last year ahead of planned IPO, FT reports - Reuters&lt;/a&gt;，這個數字將過去一年市場對 OpenAI 財務狀況的模糊預期轉化為可計算的事實。同時，The Information 與 CNBC 先後報導 OpenAI 正在評估降價策略以對抗 Anthropic 的市佔成長 &lt;a href="https://news.google.com/rss/articles/CBMiqAFBVV95cUxPWTVaQ2JmSTJ2dG1yVC1nSHpNbWJqOVlUV2dmMW1vZ2d6VmlFN1ZDRTFOVS04SDZQM01CcDZIRS1WWDluTE5hYzJBV25NZlBKMndtTmRHRG5DZmphLWEydEhmM0g5cEVHcjlvNTBueWpZRmFpcDNPekV2OU9hWV9BMzJsd3NLNWEwcFpydDFQX1JZc0I2Xy1ZcktIZDlxV2RramNQT1F5WnXSAa4BQVVfeXFMTTA0akxwVmdMa0RybEpINXVxTUxTSkJWRWNnY3pDUXN2ZDAyaW53dkRvVG1QM3hQSWt4aFlQTnJwWkVoYUdzd3F3cWFybHA5cE9feHQyTlhHc0pDaFhlMVVTWTBRU3hLRHlWamFsQldzTnpHOWRXcjREbG9UcTRTRGZsLTFhYTlnQWQ4OFpUTEZkR21mR3VsdGpRbmloN01vT0ZxTGl6dEtkM09fZS1n?oc=5" rel="noopener noreferrer"&gt;OpenAI mulls slashing prices as it competes with Anthropic for users: WSJ - CNBC&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMi1wFBVV95cUxOQWVqaVFxU2Q0WXduc1J4Tk1UY2pQZnNMMmVZZGdjLWN1RU14VWVBeHJEdnlESEFvWW1SdlM4SGtQWXUya1ZkdHF6MmxnUWMycVo3cVBieUdvcFpUSUtkcGpROGZiMU1mT1ZsVUNhOVlGdmtYRFVYaTNKTkRXMWY3aTFManhnOW83Qmk2WVRJM2ZUb0Zpb2JOSUI3bk9pTldrb09pZjRKQ1Q3OHhNYXlxaE1RcWdwQjBhdzZybmRQTmtMZ05RdG9aLXhYV3E2VW1iemc3WjNiVQ?oc=5" rel="noopener noreferrer"&gt;OpenAI Could Soon Drop Prices To Compete With Anthropic, Report Says - Forbes&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;340 億美元的 burn rate 搭配降價壓力，意味著 OpenAI 的變現緊迫性比對外釋出的敘事更尖銳——而多州檢察長的聯合調查（重點在「可能對用戶造成傷害」）&lt;a href="https://news.google.com/rss/articles/CBMihAFBVV95cUxNeDdPdDNQRGVKMXIxVGNLWURkQjZLZExOZ1JabldULVQyV3lOZ1lYWlhpYzA1eWpCZ2dVNTEzSk8xd2Z1ZzVob2hwVHBJMV9uZlFYcVZWMEVfRFdRWTdhek5ENGMyUjN2NWs1U1FlSXN2Q0RJSnlXZ0NGaS1wcnJqekFsaDA?oc=5" rel="noopener noreferrer"&gt;State Attorneys General Are Investigating OpenAI - The New York Times&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMivAFBVV95cUxOX0VNSkhmX3pnRzR1NWdCaW1iS3lCQ0dOdDREb3BSV2pNV2tTUVRBbGdMcjVtcW96Nm9rN1lkdkhQQmo1ZEgtU042TklUUUI1STFxS1F1LWE5RHVjdFJyRlRZcmU1eVl0UGxGWjJyX2dkNDlaWFlTYW50TEl5dWt0ZTA5YnozZlZva2JPOXcwREtXeU9RbE1JMUM5TEdDYWpSVlBLdTd6UVEwZ1p0X0FFS1k3eWdwRnpIRGM5UA?oc=5" rel="noopener noreferrer"&gt;OpenAI under investigation by group of state attorneys general, source says - Reuters&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiqwFBVV95cUxPTDJsQlNzM0JtUFpBTUJkalVSS3htd0ZEUVVsOFhYOGhYbFBkNkV6RkZfcERSWV9ZVngzRUYyN29uUXgycFBrek1ZUllLM3MtVzNCY1BUNXBoXzVWbEYtS2pzTFNBUVFKOXZRWlBCN2VoNkxFa0FSWEVGN18xZmRyM3N1ODNwOERDd0ZFa2ozQTNrM3o1Mjc2M3JRODlxN2t6cWsxOUI1WE5QNWc?oc=5" rel="noopener noreferrer"&gt;OpenAI hit with multistate probe into possible user harm as its IPO looms - AP News&lt;/a&gt;，為這場資金戰加上了一層無法用公關處理的監理風險。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;企業整合進展&lt;/strong&gt;：Visa 宣布將其安全支付網路直接整合進 ChatGPT &lt;a href="https://news.google.com/rss/articles/CBMivAFBVV95cUxONDZuX3ItQ1lKMmlRbEd5VWxlVjZDQXNRdDRybGFWN0N0T3hQSW4zZUxQNWV1clI5VVE1dGhsMHIwNTlmZUFwcklaWFZmNzB5aUduVGJneU9naC1Wb3VoWnpYUG9QdTRmdHZVMXU0dC04NkpGZUtwSHNnWHV4ZzFjcVR6dHdHYkdqWVFKTVpsblR1V0E0aURYeFdISTdlVzJSMlhqUnFqVFJWSHNtc0FreW9fZ1RoUkxMMWgtNQ?oc=5" rel="noopener noreferrer"&gt;Visa and OpenAI integrate Visa's secure global payment directly into ChatGPT - NPR&lt;/a&gt;。不同於多數「策略合作」公告，這筆整合有既有的全球網路與合規框架支撐，可商用性較高——是消費級 AI 產品進入金融支付基礎設施的實質進展。&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic：政府封禁與政策急轉彎
&lt;/h2&gt;

&lt;p&gt;美國政府以安全考量為由&lt;strong&gt;封禁了 Anthropic 最新版 Claude 模型&lt;/strong&gt;的使用授權 &lt;a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxNWUpveEJWVS1OcWh3cDBXVFl3SXAxUUpxemJtNXEtUm9fZWkzNVpRS2wtaU1kZDNIb0pWRkFLdllyNkpSMzNTTXV4a2trWFlNNkN5aGI3RTY5b0VSMm54bjY3a25xUXZ0bmhLUVhPSGhjUXljMWtBNk56ekJ6bEpBRHhibDg4MU1ocWRlbFQtaXltc1h6LXRReTI0d0VWLWdNZ1E?oc=5" rel="noopener noreferrer"&gt;Why the US government shut down Anthropic’s latest Claude AI model - The Conversation&lt;/a&gt;。幾乎同一週，Anthropic 迅速撤回了一項引發內部反彈的政策——此政策原被 WIRED 形容為具有「自毀傾向」，從制定到撤回的時間極短 &lt;a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxORnEtMHRDdUF6VkFPSkE5MkpSQ3pKNWdvWGpCd3dDU0dTTEE0UVF0aVhfOC0zaHRNYWdDWXFpQklCOXVRbEEwT19ZNVN3ZnVjRWRsejhMU09scUh1eTdTd2dGa3ltUXJQUHd4ZW8tMWJLU3FjUkRPYW5fQVRSUjJfZ2JnOTQ0NWFDb3NSYTFNa2ZaX1lRV2tmT1BmQmFlclFuT2NZ?oc=5" rel="noopener noreferrer"&gt;Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude - WIRED&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;這兩個事件是否相關，文章未提供直接證據；但從時間 close 與政策撤回的速度來看，Anthropic 內部存在即時的自我糾錯機制。真正需要追蹤的是：若政府封禁最終指向「不可部署於政府場景」，Anthropic 的企業銷售將面臨結構性障礙。&lt;/p&gt;

&lt;h2&gt;
  
  
  中國開源模型：GLM-5.2 在程式碼任務逼近封閉領先者
&lt;/h2&gt;

&lt;p&gt;本週最具技術實質的消息落在中國 Zhipu AI 的 GLM-5.2。&lt;/p&gt;

&lt;p&gt;根據 VentureBeat 與 The Decoder 的測試，GLM-5.2 在多項長期程式碼任務的基準上，以&lt;strong&gt;六分之一的推論成本&lt;/strong&gt;達到與 GPT-5.5 相近的表現 &lt;a href="https://news.google.com/rss/articles/CBMingFBVV95cUxPTjhWckNHTGNQQzNvTXBoVm9vWkJxX2l5NlJpOWl5YnpFczhmaWNnQU40WDRDXzFtVTl5akloR0RPd2M5U21QSWdUdmFMLVZ2MmZ1V0VCYnVvdk01TG9vbzVZcXpZZDJHb0VjZFd4TXl2NDZmYzRkMEliLTNOR3dtLV9YVzNVR2ZxS3RtQlAtNURsaExFWTMwSmVEOTJjUQ?oc=5" rel="noopener noreferrer"&gt;Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons - The Decoder&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMi0wFBVV95cUxPOWdHTVgtOTZYbHBKN1licHR2ajVhSnRJaVdoVWdRSW82SU5SVnJYNnhFYXlIRDlISDZJNm9xMUprb0JfekZpTUZSVENCay1vVDBkeTlhaEcwZU41Z2JSQUYtWHBHbVJBX2FmYlp0RDYwY3ZUU2EtaVM3eFk4anVBdXItNy1jczI2Z1kxSDFobnFtWTF6U2lxMEpaa1JXZWRkUFhORmtrNFV1TldvMmxFNUM3cjRMN2tDMkpmc3FxMG9QUnI4X3U0ZnQ5SFdGZHJyczRF?oc=5" rel="noopener noreferrer"&gt;Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost - VentureBeat&lt;/a&gt;。36Kr 以「AI 程式設計三巨頭成形」為標題，將 Zhipu 與 OpenAI、Anthropic 並列為程式碼能力的第一梯隊 &lt;a href="https://news.google.com/rss/articles/CBMiU0FVX3lxTE5ObC1ReFBsQzdTT0xFaTlxMTI0c2lKQUxNUVJIZ3MydF94WGhWY05kVUQteUl4dmxZaGNOUVRfVHIzSGw1MWt4X3ZPTlgyY2NlQjRB?oc=5" rel="noopener noreferrer"&gt;First-hand Test of Zhipu's Most Powerful Model: Are the "Three Giants" of AI Programming Set to Take Shape? - 36 Kr&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;這些數據來自模型發布方的內部測試，需等第三方驗證。但若橫向參照多個訊源同時報導此事（成本分析、技術解讀、產業敘事），可信度比單一新聞高。GLM-5.2 的關鍵意涵不是「打敗 GPT」，而是&lt;strong&gt;開源模型的性價比曲線正在快速逼近封閉模型&lt;/strong&gt;。需注意的是：Zhipu 的資料僅涵蓋中國市場應用，其對英文語境與國際開發文化的適配程度仍待實測。&lt;/p&gt;

&lt;h2&gt;
  
  
  Google：指標式 AI 互動研究與 Gemini 生態擴張
&lt;/h2&gt;

&lt;p&gt;Google DeepMind 本週發表了&lt;strong&gt;重新思考滑鼠游標在 AI 時代角色&lt;/strong&gt;的研究 &lt;a href="https://news.google.com/rss/articles/CBMi1wFBVV95cUxORHBFZElKbGtTUF8tWlA4Vk80R3AzOG9ISWxOQUgwQVRxQi13S0t2UFdtb1hqM05nd1ZDV0lHRnBQamo4NzJqRG1ucmQwcS1BZ1BFa1JKLVgxN3dnTXZkbW5Bb3RqMTdtNTNPTXVya1ZTdmJRTUVvRXVwM0xmNzVNWDR3Y3g0UHh1MUdYMXVmTl9tcFJIMmtqWEpZU0E3UGNMMGw5QlY2TW5scnVEanphYWlfVVNKSmpYY1VtLTB6U0xXc3ZfakcxVTQ5R2sydTFlMUVCWElkNNIB3AFBVV95cUxPTGVXZHA1LUl0bHVoWGJEcTB6RFQ1LUIyYmYyaDZTR2JnTFJOZG9iVDRHczljMHVybzdXWjFYaE1faW14UE5FRUFwbGZ2TEhEZFVSQlpOU2FFdnFBN1VnWS12VkNiVGtabkw0V2Fua3psWjhRWnZGdG5Qd19PQk5ybG5KRzd2TUo1dnpWU1ZZeERJYzNxQXMwak5oSnc4LUxnX01oS1BXMkNsZ0R0bmxTcWZHSVhRS3NoU3dzeFhjZlplNEd1TzFtVzN3Qk1VYjFOaXRSWGliZURkYlFw?oc=5" rel="noopener noreferrer"&gt;Google DeepMind is worried about what happens when millions of agents start to interact - MIT Technology Review&lt;/a&gt;——對「定址」（pointing）這個人類與介面互動最基礎動作的重新框架。當 AI 能主動預測意圖時，傳統的點選-確認模型是否仍是最佳互動單位？這項研究的戰略意圖在於：定義下一代人機介面的基礎構件。&lt;/p&gt;

&lt;p&gt;產品面上，Android Auto 的 Gemini 整合已進入實用階段 &lt;a href="https://news.google.com/rss/articles/CBMidkFVX3lxTFBmOENSUjJhQWJOTEdiamdFSGkxb3VrRUF5a2QwNW1GYWhhMmJWUVRqT0VWMGZhWnp1QWhhb0VCSkZZYVJMV1poY0p2VmhOYi1leWNWYUFhME9FWHpSWDFqMmlkcWpRNi1RejZqLUFwZnkyY0ZJRkE?oc=5" rel="noopener noreferrer"&gt;5 Clever Ways To Use Google Gemini With Android Auto - bgr.com&lt;/a&gt;，但 Android Police 的測試發現 Gemini 訂閱機制存在使用障礙 &lt;a href="https://news.google.com/rss/articles/CBMikwFBVV95cUxOdVVlWTRYaXVJTk9jemxobzB1X0pYM3JVWU1LZ3dudFNnTjRNRkVtNW1mR0RITlpra3E5bzRvRG9jWjlvRHZ3U09jQi1TaWUzSmQ4U2tHR29sQUJMQ2xRVnV0NkdKY1J3V1dyc2VlNjctUTlYT1RyYWNmVXpaeDdZaFNyeTJzMENielk4aEw3Y1lyTHc?oc=5" rel="noopener noreferrer"&gt;I would love a Google subscription plan, but Gemini won't let me - Android Police&lt;/a&gt;——模型能力與 distribution 執行之間仍有落差。&lt;/p&gt;

&lt;h2&gt;
  
  
  Nvidia：硬體廠商的護城河持續拓寬
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;市佔消息&lt;/strong&gt;：The Information 報導 Nvidia 在 AI 推論晶片市場的占比正在&lt;strong&gt;持續上升&lt;/strong&gt;&lt;a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxNdThGUnRHcjBPYnZFcE81S1NmNmhCYW5FOGxHMDlTb0hTS3pnWk9BX2xkVWRJZUpZSDVyUlhabjFwY3pSeEZlVVBKNXB5OGpfeXZXU3QtN3ZlWWR4SEJKbnVvOC1zSWc0MXJfdzBhaDhsUF9jQUIya1daOFhBaDhCQXdldlNmWVU2bktXaXZMa0EzdEVmQlg2RVlsQ1VMSWpITmRYbm0yV3V2d3VqcjVoVUxUQzM?oc=5" rel="noopener noreferrer"&gt;Nvidia’s Share of AI Inference Chip Market Appears to Be Rising - The Information&lt;/a&gt;。訓練市場已相對穩定，推論市場的高成長才剛開始——Nvidia 在推論端的軟體生態（CUDA、Triton）與供應鏈韌性讓後進者很難在性價比上取勝。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;合作消息&lt;/strong&gt;：Nvidia 宣布加速 Google DeepMind 的 DiffusionGemma 模型在本地 AI 場景的執行 &lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxPZXdSTmVFMlRHVFl1cW1Sblc1eW1ZNHZBb0dNcGlVMXpKVUpPVkJxRXBLZUZqaTlSdUk5XzdIbTIzTFBzazdaeVB3Z2ZjbUlwemEtZWYzaVRmendkR2ZkUVhFS3pkd0VPcThDSWNZVTJqczdhNlEteTRIR255bFoySUhuZ1Q3LTdUV3otNHc4M3p1M1pUVWxVcGR0LURKQWpocUllSEFDaElzUQ?oc=5" rel="noopener noreferrer"&gt;NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI - HPCwire&lt;/a&gt;，同時強化了 Nvidia 對開源模型的軟體優化支援，以及 Google 對本地部署（on-device）場景的認真程度。&lt;/p&gt;




&lt;p&gt;340 億美元 burn rate 的量化、OpenAI 降價壓力浮現、Anthropic 遭遇政府封禁——本週的核心不是任何單一事件，而是&lt;strong&gt;監理與資金的雙重量化&lt;/strong&gt;正將 AI 公司的真實體質攤在陽光下。同時，GLM-5.2 以六分之一成本逼近封閉模型領先者，顯示開源性價比曲線正在縮短與封閉模型的差距。技術發布的領先與商業價值的捕獲之間，鴻溝正在擴大。&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>有人在拆 Transformer：Memory Caching 與 CTM 各拆走了一半</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 11 Jun 2026 08:50:55 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/you-ren-zai-chai-transformermemory-caching-yu-ctm-ge-chai-zou-liao-ban-4lnk</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/you-ren-zai-chai-transformermemory-caching-yu-ctm-ge-chai-zou-liao-ban-4lnk</guid>
      <description>&lt;p&gt;這篇要談的兩篇研究——Google 的 &lt;strong&gt;Memory Caching&lt;/strong&gt;（RNNs with Growing Memory）和 Sakana AI 的 &lt;strong&gt;Continuous Thought Machine（CTM）&lt;/strong&gt;——常被包裝成「Transformer 殺手」。不是。它們是兩篇&lt;strong&gt;研究論文，不是產品&lt;/strong&gt;，也不是要取代 Transformer。把它們放在一起讀，真正的故事只有一句：&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Transformer 的 &lt;code&gt;self-attention&lt;/code&gt; 把&lt;strong&gt;記憶&lt;/strong&gt;（在上下文裡 recall）和&lt;strong&gt;計算&lt;/strong&gt;（思考發生在 forward pass）綁在同一個機制裡，代價是 O(L²)。這兩篇各拆走一半。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Memory Caching 拆&lt;strong&gt;記憶&lt;/strong&gt;那一半，CTM 拆&lt;strong&gt;計算&lt;/strong&gt;那一半。理解了這個軸，後面所有細節都會歸位。&lt;/p&gt;

&lt;p&gt;一個先講清楚的規矩：本文只採用原論文能支持的宣稱。二手文章裡那些「在 SWE-bench / GPQA 上如何如何」的數字，凡是回不到原論文的，一律不寫。這兩篇論文本身都沒有報告 SWE-bench 結果——把二手整理的 agent 數字寫成論文結論，是這個題目最常見的造假。&lt;/p&gt;




&lt;h2&gt;
  
  
  一、成本牆：融在一起的代價
&lt;/h2&gt;

&lt;p&gt;先講為什麼有人想拆。&lt;/p&gt;

&lt;p&gt;&lt;code&gt;self-attention&lt;/code&gt; 可以理解成一種可微分的關聯記憶：每個 query 去比對所有 key，加權讀取 value。這讓模型很會在上下文裡做 recall，也讓 in-context learning 成立。但序列長度是 L 時，完整 self-attention 的時間與空間成本是 O(L²)。相關理論工作也指出，這個二次成本不只是實作不夠好，而有更深的計算複雜度限制（見 &lt;em&gt;On the Computational Complexity of Self-Attention&lt;/em&gt;）。&lt;/p&gt;

&lt;p&gt;推理時 KV cache 緩解了自回歸生成重複計算歷史 token 的問題，但沒有免費午餐：KV cache 本身吃大量顯存，每生成一個 token 仍要與整段上下文互動。當上下文從 8K 推到 128K、1M，瓶頸通常從 FLOPs 轉向&lt;strong&gt;記憶體容量、記憶體頻寬、服務成本&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;這裡要區分清楚一件事，因為後面會反覆用到：&lt;strong&gt;「發布」≠「可用」≠「可商用」&lt;/strong&gt;。長上下文視窗能跑，跟它在你的延遲與成本預算內能跑，是兩回事。成本牆主要卡在「可商用」這一層——而這兩篇論文，目前都還停在「論文能跑」的更前面一層。&lt;/p&gt;

&lt;p&gt;把這個機制拆開看，它其實同時做了兩件事：&lt;strong&gt;記住很多、可以讀取很多&lt;/strong&gt;（記憶），以及&lt;strong&gt;運算就發生在這一次前向傳播裡&lt;/strong&gt;（計算）。Transformer 把這兩件事用一個機制、一個 O(L²) 的價格綁在一起。接下來的兩篇論文，分別質疑其中一半。&lt;/p&gt;




&lt;h2&gt;
  
  
  二、Memory Caching：拆「記憶」那一半
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;這篇出自 Ali Behrouz 等人（Google），也就是做 Titans 的同一個團隊&lt;/strong&gt;（arXiv:2602.24281，2026 年 2 月）。先記住這個團隊背景，到第四節會用上。&lt;/p&gt;

&lt;p&gt;傳統 recurrent model 的核心問題是&lt;strong&gt;固定記憶&lt;/strong&gt;。RNN、線性注意力、某些 state-space 或 recurrent memory 變體，把過去壓縮進一個固定大小的 hidden state。這帶來 O(L) 的效率，卻造成長序列下的資訊擠壓：越往後，早期資訊越容易被覆蓋、模糊、遺忘。&lt;/p&gt;

&lt;p&gt;Memory Caching 的想法很直接：不要只留當前 hidden state。把序列切成多個 segment，每個 segment 結束時的 memory state 當作 checkpoint 存下來（cache）。後續 token 不只查詢「當前線上記憶」，也能查詢過去 segment 的 cached hidden states。換句話說，RNN 不再只有一本不斷被覆寫的筆記本，而是定期留下壓縮快照。&lt;/p&gt;

&lt;p&gt;論文摘要把這個方法的定位講得很清楚：它提供一個&lt;strong&gt;介於兩端之間的可調折衷&lt;/strong&gt;——RNN 的固定記憶（O(L)）和 Transformer 的成長記憶（O(L²)）之間。&lt;/p&gt;

&lt;p&gt;這裡可以建立一個直覺（&lt;strong&gt;以下是我從機制推導的直覺，不是論文引用的複雜度結果&lt;/strong&gt;）：假設每段長度 s、整段長度 L，需要查詢的 cached memory 約 L/s 個。若每個 token 都查所有 checkpoint，成本可粗略視為 O(L × L/s) = O(L²/s)。把 s 想成一個&lt;strong&gt;旋鈕&lt;/strong&gt;：s 越大、越接近普通 RNN 的 O(L)；s 越小、checkpoint 越密、越往光譜的另一端靠。它不是魔法般消除成本，而是給你一個刻度：用多少記憶，換多少 recall。（嚴格說 s=1 並不等於 attention——那只是光譜的極端，不是同一個東西，這點不要過度宣稱。）&lt;/p&gt;

&lt;p&gt;論文提出&lt;strong&gt;四種使用 cached memory 的方法&lt;/strong&gt;，命名都來自論文本體（Introduction 的「Novel Aggregation Strategies」與各節標題，例如 §3.2 就叫 MEMORY SOUP）：&lt;strong&gt;(Gated) Residual Memory&lt;/strong&gt;——用殘差連接加上 context-aware gating 聚合多個記憶狀態；&lt;strong&gt;Memory Soup&lt;/strong&gt;——借自 weight souping，平均多個 cached memory module 的參數（對非線性記憶才有區別）；&lt;strong&gt;Sparse Selective Caching (SSC)&lt;/strong&gt;——用類似 MoE router 的方式只選最相關的 top-k cached memory 參與讀取，控制超長上下文成本。摘要只用了簡短說法「gated aggregation and sparse selective mechanisms」，完整命名在正文，要查以論文本體為準。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;落地視角&lt;/strong&gt;：Memory Caching 沒有消除成本，它把成本變成可調的。要判斷它能不能進真實 workflow，該問的不是「它比 RNN 強多少」，而是 retrieval fan-out 多大、cached memory 的記憶體頻寬代價多少、跟單純加大 KV cache 比省在哪。論文本身沒回答這些工程問題——這是「論文能跑」和「可商用」之間還沒跨過的距離。&lt;/p&gt;

&lt;p&gt;從技術信仰看，這篇務實：它不否定 Transformer 的成長記憶有價值，反而承認它有價值，然後問——能不能用壓縮的記憶 checkpoint 拿到一部分好處，而不付全額 O(L²)。&lt;/p&gt;




&lt;h2&gt;
  
  
  三、CTM：拆「計算」那一半
&lt;/h2&gt;

&lt;p&gt;CTM 出自 Sakana AI（東京，Darlow、Regan、Risi 等人，arXiv:2505.05522，NeurIPS 2025 Spotlight）。值得一提：共同作者裡有 Llion Jones——&lt;em&gt;Attention Is All You Need&lt;/em&gt; 的原作者之一、Sakana 共同創辦人。當年提出 Transformer 的人，現在在拆它，這件事本身就有意思。它的問題意識和 Memory Caching 完全不同：它不太管長上下文 recall，它質疑的是現代神經網路對「時間」與「計算」的抽象方式。&lt;/p&gt;

&lt;p&gt;先解名，因為名字本身就是論點。&lt;strong&gt;Continuous Thought Machine&lt;/strong&gt;——「思考」是一個沿著&lt;strong&gt;內部時間&lt;/strong&gt;連續展開的過程，而不是一次前向傳播吐一個答案。和 Memory Caching 的字面命名不同，CTM 的名字是個主張：思考有長度。&lt;/p&gt;

&lt;p&gt;三個機制（全部對照論文本體確認過）：&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Internal ticks（內部時間軸，與序列長度 decoupled）。&lt;/strong&gt; 論文原文：&lt;em&gt;"The CTM uses an internal dimension t∈{1,…,T}, decoupled from data dimensions."&lt;/em&gt; 模型沿一條&lt;strong&gt;自己生成的&lt;/strong&gt;時間軸 t ∈ {1,…,T} 展開，這條軸和輸入序列無關。即使輸入是一張靜態圖片，CTM 也能在內部跑 50 個 tick，不斷更新神經活動、重新注意輸入、修正輸出。&lt;strong&gt;這就是「計算」這一半被從序列長度上拆下來的關鍵。&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Neuron-level models（NLM，神經元級的時間處理）。&lt;/strong&gt; 標準網路裡，一個 neuron 多半只是一次 activation：輸入進來、過非線性、吐一個值。CTM 給每個 neuron 一個&lt;strong&gt;自己的小型 MLP&lt;/strong&gt; &lt;code&gt;g_θd&lt;/code&gt;，處理它自身的 pre-activation history。神經元不再是靜態函數，而是有局部時間歷史的微型處理器。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Synchronization as latent representation（用同步當表示）。&lt;/strong&gt; 這是最反直覺、也最核心的一點。CTM 不直接拿某一刻的 hidden state 當表示，而是追蹤不同 neuron 的活動歷史，計算 neuron pairs 之間的同步：&lt;code&gt;S_t = Z_t · (Z_t)ᵀ&lt;/code&gt;（Z_t 是到第 t 個 tick 為止的神經元活動歷史矩陣；同步用的神經元對在初始化時隨機取若干對，例如 32 對）。這個 synchronization 再被投影成 &lt;strong&gt;attention query&lt;/strong&gt;（action synchronization）和&lt;strong&gt;輸出 logits&lt;/strong&gt;（output synchronization）。換句話說，模型真正拿來決策的，不是單一時間切片，而是神經活動在時間上的&lt;strong&gt;協調模式&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive compute。&lt;/strong&gt; CTM 在每個 tick 都產出 yt，並算 certainty = 1 − normalized entropy。推理時可以設一個門檻（例如 0.8），certainty 夠高就提前停。難的 instance 多想幾個 tick，簡單的早停。計算量隨輸入難度變化——這就是「計算這一半」變成可調旋鈕的具體樣子。&lt;/p&gt;

&lt;h3&gt;
  
  
  順帶分清楚：CTM 和 chain-of-thought 不是同一回事
&lt;/h3&gt;

&lt;p&gt;你可能會想到 chain-of-thought（CoT）。值得先把兩者分開——它們不在同一層。&lt;/p&gt;

&lt;p&gt;CoT 是&lt;strong&gt;提示技巧&lt;/strong&gt;，跑在普通 Transformer 上：你讓模型把「Step 1… Step 2…」寫成輸出 token，思考過程就是那串文字。想多想一點，就是多寫 token——成本仍綁在序列長度上，仍走 O(L²) 那條路。&lt;/p&gt;

&lt;p&gt;CTM 是&lt;strong&gt;架構&lt;/strong&gt;，不是提示。它的「思考」不產生任何 token：模型沿內部時間軸展開神經活動，可以對一張靜態圖片跑 50 個 tick，輸出零個中間 token。一句話分辨：&lt;strong&gt;CoT 用 token 思考，CTM 用內部時間思考。&lt;/strong&gt; 這個差別正是本文的主軸——CoT 是在 Transformer 既有的機制裡爭取更多推理（所以付一樣的 token 帳單），CTM 則把推理從 token 軸上整個拿開。&lt;/p&gt;




&lt;h2&gt;
  
  
  四、同一個問題的兩半
&lt;/h2&gt;

&lt;p&gt;現在把兩篇放回一起。它們不是「對決」，也不是兩個競爭的賭注——它們在拆同一個東西的不同部位。&lt;/p&gt;

&lt;p&gt;Transformer 的 self-attention 同時扛了&lt;strong&gt;記憶&lt;/strong&gt;和&lt;strong&gt;計算&lt;/strong&gt;，付 O(L²)。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Caching&lt;/strong&gt; 拆&lt;strong&gt;記憶軸&lt;/strong&gt;：讓 recall 便宜、可增長，不走完整的二次成本。它的成敗好衡量——Needle-in-a-Haystack、LongBench、in-context retrieval 這類任務。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CTM&lt;/strong&gt; 拆&lt;strong&gt;計算軸&lt;/strong&gt;：讓內部計算時間和序列長度脫鉤，用神經動態與同步當核心。它關心的是「同一個輸入能不能投入不同長度的內部思考」，更接近推理、規劃、模擬。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;這也是為什麼第二節要你記住 Behrouz 是 Titans 團隊：Memory Caching 是「外部／顯式記憶」這條線的延伸思路——記憶是一個可以加掛、可調成本的層。CTM 走的是另一個方向——計算不是一次性的前向傳播，而是一段可以拉長的內部過程。一個在問「記憶怎麼便宜」，一個在問「計算怎麼動態」。&lt;/p&gt;

&lt;p&gt;所以它們互補，不互斥。把它們擺成「誰取代誰」會錯過重點——重點是 Transformer 把兩件事綁死了，而現在有人開始分別鬆綁。&lt;/p&gt;




&lt;h2&gt;
  
  
  五、Scaling law 會被改寫嗎？
&lt;/h2&gt;

&lt;p&gt;傳統 scaling law 關注三個變數：model size、data size、training compute。Kaplan 等人的工作強化了「規模帶來可預測進步」的信念；Chinchilla 進一步指出固定訓練算力下，參數量與訓練 token 數要更平衡地擴張。&lt;/p&gt;

&lt;p&gt;這兩篇不會推翻這些 scaling law。但它們各自提示一個&lt;strong&gt;新變數正在變重要&lt;/strong&gt;——以下是推論，不是論文宣稱：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Caching 指向 memory capacity / retrieval cost。&lt;/strong&gt; 模型不只要大，還要能用合理成本保存與檢索長期資訊。未來的 scaling 帳，可能不能只看參數和 token，還要看記憶容量、壓縮率、retrieval fan-out、記憶頻寬。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CTM 指向 test-time compute / internal dynamics。&lt;/strong&gt; 模型不只在訓練時花算力，也在推理時分配內部思考步數。若難題需要更多 tick、簡單題可早停，那 scaling 就不只是「訓練更大的模型」，還包括「測試時怎麼有效花算力」。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;這兩個推論都錨在前面講過的機制上——O(L²/s) 那個旋鈕、tick 數那個旋鈕——不是憑感覺喊未來。能不能成立，要看後續有沒有人在真實規模上把這兩個旋鈕跑出可預測的曲線。目前沒有。&lt;/p&gt;




&lt;h2&gt;
  
  
  六、實驗數據與現實局限
&lt;/h2&gt;

&lt;p&gt;這節最重要，因為它決定了前面所有東西該打幾折。再說一次：&lt;strong&gt;這是兩篇研究論文，不是產品。&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CTM&lt;/strong&gt; 的驗證任務（對照論文本體）：2D maze（39×39，並可重複套用泛化到 99×99）、ImageNet-1K（搭配 ResNet-152 特徵抽取器、50 個 tick 下 72.47% top-1，論文自己也說不是衝著 accuracy 來的）、parity（64-bit 累積 XOR）、CIFAR-10/100、sorting、Q&amp;amp;A MNIST、RL（CartPole、Acrobot、MiniGrid）。注意那個 ImageNet 數字是 CTM 接在強 CNN backbone 上的結果，不是端到端的獨立分類器——把它讀成「CTM 自己拿到 72%」會高估。論文&lt;strong&gt;明講不是要刷 SOTA&lt;/strong&gt;：&lt;em&gt;"preliminary and not intended to beat state-of-the-art … a limitation of this paper is its relatively limited depth of comparison since we favored breadth."&lt;/em&gt; 自陳限制也很清楚：internal sequence 讓&lt;strong&gt;訓練時間拉長&lt;/strong&gt;，NLM &lt;strong&gt;增加參數量&lt;/strong&gt;。換句話說，它買到的「內部思考」是用訓練成本和參數量換的——這正是「可商用」層該追問的代價。還有一筆推理側的帳：certainty 早停是 data-dependent 的，難的 instance 會一路跑到滿 T 個 tick，per-instance 延遲不固定，會讓延遲預算和 batched serving 變難——adaptive compute 的彈性不是免費的。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Caching&lt;/strong&gt; 的有效證據主要在語言建模、長上下文理解、in-context recall。論文摘要的措辭很誠實：在 recall 密集的任務上，&lt;strong&gt;Transformer 仍取得最佳準確率&lt;/strong&gt;，MC 變體做到的是「競爭性表現、縮小與 Transformer 的差距、勝過 SOTA recurrent model」。注意這個層次——它不是宣稱打贏 Transformer，是宣稱在 recurrent 這條線裡把差距縮到值得一試。&lt;/p&gt;

&lt;p&gt;兩篇都該謹慎解讀的共同點：截至可見的原論文資料，&lt;strong&gt;都沒有正式報告 SWE-bench / SWE-bench Verified / SWE-bench Pro 結果&lt;/strong&gt;。如果你在某篇二手文章看到這些架構「在 agent 工具調用上如何如何」的數字，而那數字回不到原論文——它就不該被當成論文結論。這不是吹毛求疵，這是「發布 ≠ 可用 ≠ 可商用」的最後一道防線。&lt;/p&gt;




&lt;h2&gt;
  
  
  七、重新組裝
&lt;/h2&gt;

&lt;p&gt;如果你接受第四節那個框架——Transformer 把記憶和計算綁在一起，這兩篇各拆一半——那麼下一步是什麼，幾乎是&lt;strong&gt;邏輯上的必然，而不是許願&lt;/strong&gt;：拆開之後，把它們重新組裝。&lt;/p&gt;

&lt;p&gt;未來更可能出現的不是某個單一架構勝出，而是&lt;strong&gt;混合架構&lt;/strong&gt;：Transformer 保留強大的通用建模能力當基座；一個 Memory-Caching-like 的層提供長期、低成本、可選擇性讀取的記憶；一個 CTM-like 的核心提供內部推理時間與 adaptive compute。記憶軸便宜化、計算軸動態化，各司其職。對需要長期互動的 agent 或 world model，這個分工特別合理——昂貴的 attention 不該扛所有歷史，內部推理也不該被序列長度綁死。&lt;/p&gt;

&lt;p&gt;需要標明：&lt;strong&gt;這一節是推論，不是任何一篇論文的宣稱。&lt;/strong&gt; 沒有人證明這個組裝會成立。但如果你問「為什麼會有人同時做這兩個方向」，答案不是巧合——是因為它們在拆同一個東西。&lt;/p&gt;




&lt;h2&gt;
  
  
  結語
&lt;/h2&gt;

&lt;p&gt;Transformer 不會立刻退場。它的軟硬體生態、訓練 recipe、開源工具鏈、產業部署都太成熟，短期內仍是主流基座。&lt;/p&gt;

&lt;p&gt;但架構競爭的焦點正在改變。下一階段的進步，不會只靠堆參數和拉長上下文。&lt;strong&gt;記憶怎麼便宜、計算怎麼動態&lt;/strong&gt;——這兩件被 self-attention 綁在一起、現在被分別鬆綁的事，會變成新的核心問題。&lt;/p&gt;

&lt;p&gt;Memory Caching 和 CTM 的共同訊號不是「Transformer 要被取代了」。是更安靜的一句：有人開始拆它了。Transformer 的統治還沒結束，但它的孤獨時代正在結束。&lt;/p&gt;




&lt;h2&gt;
  
  
  參考來源
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Memory Caching: RNNs with Growing Memory — Behrouz, Li, Deng, Zhong, Razaviyayn, Mirrokni (Google). arXiv:2602.24281 — &lt;a href="https://arxiv.org/abs/2602.24281" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2602.24281&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Continuous Thought Machines — Darlow, Regan, Risi, Seely, Llion Jones (Sakana AI). arXiv:2505.05522 — &lt;a href="https://arxiv.org/abs/2505.05522" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2505.05522&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Continuous Thought Machines — NeurIPS 2025 (Spotlight), OpenReview — &lt;a href="https://openreview.net/forum?id=y0wDflmpLk" rel="noopener noreferrer"&gt;https://openreview.net/forum?id=y0wDflmpLk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Continuous Thought Machines — Sakana AI 官方互動 demo／blog（同一研究） — &lt;a href="https://pub.sakana.ai/ctm/" rel="noopener noreferrer"&gt;https://pub.sakana.ai/ctm/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Attention Is All You Need — &lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1706.03762&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Scaling Laws for Neural Language Models（Kaplan et al.）— &lt;a href="https://arxiv.org/abs/2001.08361" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2001.08361&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Training Compute-Optimal Large Language Models（Chinchilla）— &lt;a href="https://arxiv.org/abs/2203.15556" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2203.15556&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;On the Computational Complexity of Self-Attention — &lt;a href="https://arxiv.org/abs/2209.04881" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2209.04881&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>transformers</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>AI Weekly — 2026-06-05 to 2026-06-11 | OpenAI Files S-1: What the IPO Actually Means</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 11 Jun 2026 03:23:05 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-06-05-to-2026-11-openai-files-s-1-what-the-ipo-actually-means-560b</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-06-05-to-2026-11-openai-files-s-1-what-the-ipo-actually-means-560b</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;OpenAI has filed its S-1 confidentially. Meanwhile the Microsoft partnership is fraying at the seams, Anthropic shipped two models in 48 hours, and Visa is wiring payments directly into ChatGPT. The story this week is not about capability — it is about infrastructure, money, and who controls the distribution layer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  OpenAI's IPO: Timing, Structure, and What It Signals
&lt;/h2&gt;

&lt;p&gt;OpenAI filed a confidential S-1 with the SEC this week, with Axios and CNBC reporting the company is prepping Wall Street for what is expected to be one of the largest AI debuts in history&lt;a href="https://news.google.com/rss/articles/CBMiVkFVX3lxTE9zYlJFR3NXNW5ndGtZeVFKRHUyRkdVMmNJZHF6MENydkVtODlaNWd4Zk04a0t6QktUS25QNzdwd09kLWl6VWVRZlZkLVdNNmlaeVZlR1l3?oc=5" rel="noopener noreferrer"&gt;OpenAI files paperwork for an IPO - Axios&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxOMVhyczJEUzJvM0ktd0k2RmJwZGY2YW1CRThUTTdLNERQY0ZXVXUyNjhHMklyRFVSdnRoTlc3dTFZd0NqNDd2TDJrYk43NTc4Y0Z5R0Qta08zLU5XdVVSd05LNF94WTlDS05xRklDYzluVnpWUjFqeHg2QWVZZWdSMkJHN0VVYWQtSU9BNVA3ejg2V3YyOHM3TkhsZFRtakxoRXY2WjZxWkdUZ9IBrwFBVV95cUxNYmdiVHBYUUVMODVqaDh4ZjJmV3RyXzNJa1BtblRDRWRFbEIzb2tQNURabU1hSmEtQkZCSDhva3JCVWFIYV9leDVYaVFxWXBQeTNpeHoxbVQzRk5vMVdsa1dpZW0tLXVNRXhuVzZtLVBCSTU2U0pBRDE2NzFKOFRqT041anVxS0RfamwtcEpqeHhiTnNxZ3ZIZWotSzZsSHQ5eTRJcW8xRGdMRkdjWmxF?oc=5" rel="noopener noreferrer"&gt;OpenAI confidentially files for IPO, prepping Wall Street for mega AI debut - CNBC&lt;/a&gt;. The timing is notable: the company simultaneously published a post titled "Built to benefit everyone" laying out its economic vision&lt;a href="https://news.google.com/rss/articles/CBMibEFVX3lxTE1acHQzdUtIMzJkVmlHbkJ1Z294Y3p1d0JIR0FieFFQOHZwaENrUkFQNWFhNmctWVU0ODlSZDdFN0NZQUFpUHBqeEVxTWhNTlRNV3Z2NVRHMHlVcVBEckt4VXhScnhDaU9OcEd3cw?oc=5" rel="noopener noreferrer"&gt;Built to benefit everyone: our plan - OpenAI&lt;/a&gt; and launched an Economic Research Exchange to publicly share research methodology&lt;a href="https://news.google.com/rss/articles/CBMiYkFVX3lxTFA2QjliMWJCUGplNU1kRFdCMWlGb3o5UHpzUnVJMnd1U0dSUmQ0RGhnZ0M5eGxLVGlIM3V2LXhDb1k1M3VwWWRibEc2eUg3dDZkWGFXUXJJajFnOVRiTnY3VUtB?oc=5" rel="noopener noreferrer"&gt;Introducing the OpenAI Economic Research Exchange - OpenAI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this is not:&lt;/strong&gt; a product announcement. Confidential S-1 filings mean the paperwork is in; they do not mean the IPO is imminent or that the S-1 is public. The actual offering terms, valuation, and timeline remain unknown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; a company crossing a revenue threshold that makes public market disclosure practical and attractive. The Microsoft-OpenAI relationship has been deteriorating publicly since at least June 5th, when Yahoo Finance reported continued friction and restructuring pressure&lt;a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxOVXhfV25Tdk1IZnNVX3gxYmZiMzFpN0ttdHcxY3NJNDN3ck9GV2dLb29GeDNCX0RCY2c0cG5uLUY2Z0E3ME5MX0Z0a2RNMU9MbFA3c1VaOFY1bTh3eWUydk40bmkxR3oyQ0hnOWZ1akdHa0lQVlBpa1Zfb1RpVExHTWF1aXZFckQzLTBCVzVHb2E2ODZjT1pxU1o2ZFdKMk1ZeDRVejdpZ3hTTnFKaE50SGh3?oc=5" rel="noopener noreferrer"&gt;Microsoft and OpenAI's relationship continues to crumble - Yahoo Finance&lt;/a&gt;. An IPO gives OpenAI a capital channel independent of Microsoft.&lt;/p&gt;

&lt;p&gt;The Visa partnership announced June 10th is the more immediately concrete story: Visa is integrating payment infrastructure directly into ChatGPT&lt;a href="https://news.google.com/rss/articles/CBMiywFBVV95cUxQNnhSb05qd01OSEljTlBIejRYSFZ2bDVGRDdJeFpmZnRlMEU3dGYtLTNMUkc5SXpfTmRYQk9FNDRrQUxiNHZIUmlRc0p0LTROLVl0UGVyU0V0ZGFBQ3dQSEtDRUJadUk4bjNMbnYtT1gyQU5neVh1TUt6YnZGS0h4WTVVMFlZUG9MeWw3X2ZRTG90TzVhMGJXZlp3Q3BTUTQ5NnlPdEVCTDk2X0Q4YVNacmV6RktPcW01TnBjd3hJTDJJSnhRUTI5Vi1ORQ?oc=5" rel="noopener noreferrer"&gt;Visa Partners with OpenAI to Power the Next Generation of AI Commerce - Visa - Investor Relations&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxQbEtRbUVBNldBSEl0bTBVeHdGYWUzdi1TQ256RTVnbWhwaEFTYXpQLVdWZ1V6MkpwYXh1TXZydlUxWHpncW5SYlhJaXJNWXJsSXVBUFd1ZGZJWnhBMndDTUhST0dmYUoybGFMdUlWSjM3MGtPM3NkSW9tZ3BsZng1akc3UURSWkNHVGZSQ3dpOUFJZVBKOVktV0w2TTB1TzMxTFYwd2g2Wl9SZw?oc=5" rel="noopener noreferrer"&gt;Visa to Secure Payments for Shoppers on ChatGPT in OpenAI Partnership - WSJ&lt;/a&gt;. This is a major card-network integration directly into ChatGPT commerce — not a chatbot feature, but wiring inside the transaction layer. OpenAI had earlier agentic-payment integrations with PayPal and NPCI/Razorpay in 2025, so the important signal here is Visa's scale and card-network reach. If Visa-OpenAI scales, it changes the economic model of consumer AI from subscription-only to transaction-fee, which is a fundamentally different business.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Ships Two Models in 48 Hours
&lt;/h2&gt;

&lt;p&gt;Anthropic released Claude Fable 5 and Claude Mythos 5 on June 9th&lt;a href="https://news.google.com/rss/articles/CBMiZEFVX3lxTE0tNXJJQXNGZWM1d1VmMVRfOFZlYU9VUFcwdm03RFhDRE5uajJOY19mbUQ5ZVpXYW9IOWhzd3JOS3F2c0xIdFRYVUgtZF9RVVI1VW1KVExGb0RVRDBWdlpNTXN1cTA?oc=5" rel="noopener noreferrer"&gt;Claude Fable 5 and Claude Mythos 5 - Anthropic&lt;/a&gt;, preceded by "Making Claude a chemist" on June 5th&lt;a href="https://news.google.com/rss/articles/CBMiakFVX3lxTE8wMkgxR3JCakJ0S2pJLUEzQ2gzZjVHdjNjUjBaV2hVTHE4UFYtTVdpdmdUcUtzeldxWXpEXzJRX2dTTzJjSGlpbm1CME9mZkJZa2FWVnpCdU1UeDhYbEprQjVSeVdQakUzMGc?oc=5" rel="noopener noreferrer"&gt;Making Claude a chemist - Anthropic&lt;/a&gt; — a research paper demonstrating Claude's performance on chemistry tasks. The timing suggests Anthropic wanted to pre-empt OpenAI's IPO news cycle.&lt;/p&gt;

&lt;p&gt;Fable 5 is Anthropic's latest public frontier model, while Mythos 5 is a restricted trusted-access tier for sensitive domains such as cybersecurity and life sciences. The chemistry paper is more verifiable than the model announcements: it describes real task performance on a defined domain, whereas the model releases still rely mostly on Anthropic's own launch materials. Without independent evaluation methodology, the capability claims should be treated as announced, not verified.&lt;/p&gt;

&lt;p&gt;Anthropic is clearly positioning Claude as a workstation for knowledge workers, not a general chat interface. The chemistry work and the trusted-access Mythos tier suggest a verticalization strategy: own specific professional workflows while controlling access to the highest-risk capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google DeepMind: Gemini Omni and a Sierra Leone RCT
&lt;/h2&gt;

&lt;p&gt;Google DeepMind published Gemini Omni&lt;a href="https://news.google.com/rss/articles/CBMiYkFVX3lxTE0wOVZjc2o4dFdxdHljVS0yNWNNNU01NmFWNW82TVI0T1ptb1JxS0wzV3FoMkp6NGtoWDB0MmxiTFNodnUzSGpoOElDTDVlRWRpdDNRRFdzY2pRTWZULWxxRDh3?oc=5" rel="noopener noreferrer"&gt;Introducing Gemini Omni - blog.google&lt;/a&gt; on June 10th and a randomized controlled trial of Gemini's guided learning in Sierra Leone&lt;a href="https://news.google.com/rss/articles/CBMingFBVV95cUxPSGxaODhWMGdPMm1mQ2phVTB4bkszM2d4S3NzWVBIT2wzdG5YNmpZUmxjdW55cEM2UEFWb0JZdzVLTjRlU3JfVEFuS25icXZUTWJNdmwzeFBuLVZoTURxU2l2UmkxUGY5LWRxUW13c2pKa0xXOFM4WENkb28zZjl0WGJCSWh1VklpQTBHV1hvY3V4YVQ4cXdfMmQxQV9rZw?oc=5" rel="noopener noreferrer"&gt;Gemini’s guided learning: results from a randomized controlled trial in Sierra Leone - Google DeepMind&lt;/a&gt; on June 9th. The RCT is methodologically notable — Google published a pre-registered trial with results, not a marketing benchmark. The Sierra Leone context is relevant: it tests Gemini in a low-resource educational environment, not a high-income enterprise setting. This matters for claims about AI democratizing access.&lt;/p&gt;

&lt;p&gt;Gemini Omni was announced at Google I/O as a multimodal generative model family, with Omni Flash available in the Gemini app, Flow, and YouTube Shorts. That makes it more concrete than a research preview, though enterprise buyers still need to verify API access, pricing, latency, and governance details before treating it as a production dependency.&lt;/p&gt;

&lt;p&gt;The Nvidia partnership for DiffusionGemma&lt;a href="https://news.google.com/rss/articles/CBMidEFVX3lxTFByMVFNeDd5VENidnBnZ0lqUi1CcUtYbmR5eGdjbU1XYk5jRVkyRkNQSWNwT010cmNMMmNsM25tcnNvd0MzTTRCWkszYW9SdDV4UHF6TUJ4Um1zVHl3T2dpeHdjRzV2X19aZlNTbDl0ZHl3eWJX?oc=5" rel="noopener noreferrer"&gt;NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI - NVIDIA Blog&lt;/a&gt; running locally on NVIDIA hardware is worth tracking for a different reason: it represents the on-device AI narrative that has been building for 18 months. Local inference means no per-token cloud cost and no latency round-trip. If the NVIDIA integration is stable, it is a real engineering constraint on cloud-only AI economics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure: D-Matrix, SpaceX-Alphabet, and Intelligence Layers
&lt;/h2&gt;

&lt;p&gt;Three infrastructure stories this week represent money moving before the model capability debate resolves.&lt;/p&gt;

&lt;p&gt;Microsoft-backed D-Matrix raised its profile as a challenger to Nvidia in inference compute&lt;a href="https://news.google.com/rss/articles/CBMihAFBVV95cUxOdWJ5aVlCMjNDN2pMbnJxZWRRWmx3dzg3TV9Hcng5dHNZeGNMdjlBN0NZX29mMmFIajNvcXgzdFFvOHZqY3hYTHkwM3JRQTh4LVM0c2E4UmdWMUFBUDE4Z25TU0RELTNHS3kyek13eVR5SU55WEdTYU1yZ3FJeENvQlVOSEXSAYoBQVVfeXFMUHBUV2FERlhPMHVkOXJNUFJtdmY3dXJrNFNMNzBRRk9nX0NJVHZwRXBXRjIzRDdQRGZ5cVNmWjFEY1NIUzY2WUluejNYdl9faVNjY1VSVnFjQVc0N0RXbE5GV3FVR3VhNHlzUURVVTVNelJLVnRFTDgxZDl5bTdhNGMzZFpiNkVobzh3?oc=5" rel="noopener noreferrer"&gt;Upstart chipmakers keep challenging Nvidia. This time it's Microsoft-backed D-Matrix - CNBC&lt;/a&gt;. The chip-level competition in AI infrastructure is real — Nvidia's H100/H200 dominance is being challenged by multiple startups. D-Matrix's architecture is inference-specialized, which is a different problem than training. If it works, it lowers the cost ceiling for deploying large models, which benefits every AI company downstream.&lt;/p&gt;

&lt;p&gt;Alphabet's $920M monthly deal with SpaceX&lt;a href="https://news.google.com/rss/articles/CBMifEFVX3lxTE01TkVPWHdFOWlXWmtKcjd1Zy1CTXo0ODhoY2loLUpkUUFtcWxWcE5YanFtZWNmbUdKNDJDWE5CVVctSUJBSEx1b1pJc3oyWDBkdlo1aEpuRWo0b0UzNzh2MWZEN0lCVmo3dGxCTzZyWlRtbmp4Ulo4cjVpZzQ?oc=5" rel="noopener noreferrer"&gt;Inside Alphabet’s Massive $920 Million Monthly AI Deal With SpaceX - Barron's&lt;/a&gt; is a cloud service agreement for access to roughly 110,000 Nvidia GPUs, not a satellite-bandwidth contract. The important point is compute capacity: large buyers are securing GPU clusters through long-running infrastructure agreements, and those commitments shape the cost floor for model deployment before the model capability debate resolves.&lt;/p&gt;

&lt;p&gt;The "intelligence layer" models story&lt;a href="https://news.google.com/rss/articles/CBMipwFBVV95cUxQd0VDMjV4cjRLaDk4WmJDNWZETUV3dmJyMTBGMmp1QjItSEkteWRySXhjVnk5R01mbzVmZk5ocnNWbmR6cjFNNGx4MFQxTWEwMUl4N3BzYjV1TmJCMVpib2EzendnNi1DRUZ6OUhPOF9VQnFpOV9EZ0FCdVB1ZGV5bjBRcElDWHdZMGQtZzVYQUdySjU2d2Y1REFKMEFZN0Y4VkFReFJ0NA?oc=5" rel="noopener noreferrer"&gt;Intelligence layer models protect enterprise AI investments - SiliconANGLE&lt;/a&gt; — layer models that sit between foundation models and enterprise applications to protect AI investments — reflects a maturing market. Enterprises are not just buying foundation model API access; they are buying governance, auditability, and control layers on top. This is infrastructure-building below the application layer, which typically happens only after the application layer has stabilized.&lt;/p&gt;

&lt;h2&gt;
  
  
  China, Influence, and Regulatory Countermeasures
&lt;/h2&gt;

&lt;p&gt;OpenAI claimed China launched an influence campaign to shape US attitudes on AI data centers&lt;a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxPLXcwaXBhY3BNaDJjc2QxSmJ4MlNMODNFNERwNlFaUjVJbDkyeVdULS1PRXZzU2hTOUk0NVZwS1hGV3FFRDl3Z1FaaWpXNm92eDR6NVA2UTA3QjhoaTVSSV9tU0J2LXo4R0hrc1Q3a1NLLTZRZDJ6dURfejRqekxGZVNJRVRPa212WHFuVVBR?oc=5" rel="noopener noreferrer"&gt;OpenAI says China launched influence campaign to shape US attitudes on AI data centers - Politico&lt;/a&gt;. The specific claim — that a foreign state actor attempted to manipulate public opinion on a US infrastructure topic — is notable because it targets a specific policy debate (data center siting, power consumption, water usage) rather than a general political topic. If accurate, it means AI policy is now in the information warfare threat model.&lt;/p&gt;

&lt;p&gt;The regulatory counterpoint is Inside Higher Ed's coverage&lt;a href="https://news.google.com/rss/articles/CBMixgFBVV95cUxObEhqSElLbmY5b1U0RlpjbUFsNDFBaVpUenNqR1JEQnZnd19ndXMxd3lHaDd1cmVHcHlxS3hrM3I2U0RlMnNseVdwQWxUV3dXVDB3NjhKQmJzbm14bXVGcUl4YlQ2ZUYyQXluUHZ1dVFremQxR1V5SG5uLVpJY1ZqOUNyTXN6RGczYm1Nb2ptSjU5Rl9fWTJkZ2VmQWdLUE1taWx3MVhQQVZYb0hHWEpRNExNYkZBdGlOSjBPUUxDd181MlpQcUE?oc=5" rel="noopener noreferrer"&gt;‘All or Nothing’ Approach to AI ‘Risks Shutting Down Innovation’ - Inside Higher Ed&lt;/a&gt; of arguments that "all or nothing" AI risk regulation risks shutting down innovation. The substantive question is whether the regulatory proposals on the table actually target demonstrated harms or generic capability speculation. The article surfaces the debate — it is worth noting that the regulatory argument is now explicitly framing innovation as in tension with safety, which was not the dominant framing 18 months ago.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI 週報 — 2026-06-05 to 2026-06-11 | OpenAI 掛牌倒數：秘密 S-1 提交背後的三個技術訊號</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 11 Jun 2026 03:17:15 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-06-05-to-2026-06-11-openai-gua-pai-dao-shu-s-1-wen-ben-li-de-san-ge-ji-shu-xun-hao-1n6d</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-06-05-to-2026-06-11-openai-gua-pai-dao-shu-s-1-wen-ben-li-de-san-ge-ji-shu-xun-hao-1n6d</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;本週一句話：&lt;/strong&gt; OpenAI 的秘密 S-1 提交流程比「上市」標題更值得追蹤——真正影響企業 API 依賴策略的，是未來公開文件會如何描述雲端合作、資本支出與風險揭露。&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  OpenAI 掛牌進程啟動——據媒體報導可追蹤的三個技術訊號
&lt;/h2&gt;

&lt;p&gt;技術長應該追蹤的不是 OpenAI 估值，而是這次秘密提交之後，未來公開文件可能揭露的技術與基礎設施敘事。據媒體報導，OpenAI 於 6 月 8 日向 SEC 秘密提交 S-1 草案&lt;a href="https://news.google.com/rss/articles/CBMiaEFVX3lxTE9IS1J5NUcwRDN2SFdjalBkWlFxMERTcFR3dEdUcG1WRk9jX1ZfM2x0MElhMFY0SXNuUU9RSnpMYWhEYzZJclhsQ1FrbG9YNlRESjg2a0FwZ3AwMERFWHJTdWdDSXRIakI3?oc=5" rel="noopener noreferrer"&gt;Confidential submission of draft S-1 to the SEC - OpenAI&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiVkFVX3lxTE9zYlJFR3NXNW5ndGtZeVFKRHUyRkdVMmNJZHF6MENydkVtODlaNWd4Zk04a0t6QktUS25QNzdwd09kLWl6VWVRZlZkLVdNNmlaeVZlR1l3?oc=5" rel="noopener noreferrer"&gt;OpenAI files paperwork for an IPO - Axios&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxOMVhyczJEUzJvM0ktd0k2RmJwZGY2YW1CRThUTTdLNERQY0ZXVXUyNjhHMklyRFVSdnRoTlc3dTFZd0NqNDd2TDJrYk43NTc4Y0Z5R0Qta08zLU5XdVVSd05LNF94WTlDS05xRklDYzluVnpWUjFqeHg2QWVZZWdSMkJHN0VVYWQtSU9BNVA3ejg2V3YyOHM3TkhsZFRtakxoRXY2WjZxWkdUZ9IBrwFBVV95cUxNYmdiVHBYUUVMODVqaDh4ZjJmV3RyXzNJa1BtblRDRWRFbEIzb2tQNURabU1hSmEtQkZCSDhva3JCVWFIYV9leDVYaVFxWXBQeTNpeHoxbVQzRk5vMVdsa1dpZW0tLXVNRXhuVzZtLVBCSTU2U0pBRDE2NzFKOFRqT041anVxS0RfamwtcEpqeHhiTnNxZ3ZIZWotSzZsSHQ5eTRJcW8xRGdMRkdjWmxF?oc=5" rel="noopener noreferrer"&gt;OpenAI confidentially files for IPO, prepping Wall Street for mega AI debut - CNBC&lt;/a&gt;，但草案內容尚未公開；目前不能把任何具體段落或行號當成已驗證材料。公司若正式進入公開發行流程，才會逐步出現可供外部審閱的公開文件。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;目前可追蹤的三個訊號：&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Microsoft 合作架構會成為未來公開文件的重點&lt;/strong&gt;&lt;br&gt;
上週已多次出現「微軟與 OpenAI 關係惡化」的報導&lt;a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxOVXhfV25Tdk1IZnNVX3gxYmZiMzFpN0ttdHcxY3NJNDN3ck9GV2dLb29GeDNCX0RCY2c0cG5uLUY2Z0E3ME5MX0Z0a2RNMU9MbFA3c1VaOFY1bTh3eWUydk40bmkxR3oyQ0hnOWZ1akdHa0lQVlBpa1Zfb1RpVExHTWF1aXZFckQzLTBCVzVHb2E2ODZjT1pxU1o2ZFdKMk1ZeDRVejdpZ3hTTnFKaE50SGh3?oc=5" rel="noopener noreferrer"&gt;Microsoft and OpenAI's relationship continues to crumble - Yahoo Finance&lt;/a&gt;，但焦點應在：130 億美元的 Azure 投資與算力合作在未來 IPO 文件裡會如何被描述。技術長需要關注的是——若公開文件披露任何雲端合作架構變更，將直接影響企業採用 OpenAI API 的長期可靠性評估，而非「合作是否惡化」這個已被重複報導的事實。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. 「Built to benefit everyone」是公關包裝，不是技術文件&lt;/strong&gt;&lt;br&gt;
本週同步發布的「Built to benefit everyone」計畫&lt;a href="https://news.google.com/rss/articles/CBMibEFVX3lxTE1acHQzdUtIMzJkVmlHbkJ1Z294Y3p1d0JIR0FieFFQOHZwaENrUkFQNWFhNmctWVU0ODlSZDdFN0NZQUFpUHBqeEVxTWhNTlRNV3Z2NVRHMHlVcVBEckt4VXhScnhDaU9OcEd3cw?oc=5" rel="noopener noreferrer"&gt;Built to benefit everyone: our plan - OpenAI&lt;/a&gt;，對技術決策者沒有實質參考價值——它既非產品路線圖，也非安全評估報告。同週發布的 OpenAI 經濟研究交換平台（Economic Research Exchange）&lt;a href="https://news.google.com/rss/articles/CBMiYkFVX3lxTFA2QjliMWJCUGplNU1kRFdCMWlGb3o5UHpzUnVJMnd1U0dSUmQ0RGhnZ0M5eGxLVGlIM3V2LXhDb1k1M3VwWWRibEc2eUg3dDZkWGFXUXJJajFnOVRiTnY3VUtB?oc=5" rel="noopener noreferrer"&gt;Introducing the OpenAI Economic Research Exchange - OpenAI&lt;/a&gt;若能提供實際的 API 定價數據與用量趨勢，才是企業成本建模的可用工具。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. 中國影響力操作已進入政策風險雷達&lt;/strong&gt;&lt;br&gt;
OpenAI 主動發布新聞稿，指出中國發起影響力操作，意圖塑造美國對 AI 資料中心的輿論&lt;a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxPLXcwaXBhY3BNaDJjc2QxSmJ4MlNMODNFNERwNlFaUjVJbDkyeVdULS1PRXZzU2hTOUk0NVZwS1hGV3FFRDl3Z1FaaWpXNm92eDR6NVA2UTA3QjhoaTVSSV9tU0J2LXo4R0hrc1Q3a1NLLTZRZDJ6dURfejRqekxGZVNJRVRPa212WHFuVVBR?oc=5" rel="noopener noreferrer"&gt;OpenAI says China launched influence campaign to shape US attitudes on AI data centers - Politico&lt;/a&gt;。這不是技術事件，但企業若計畫在美國部署大規模 AI 基礎設施，地緣政治風險已成為必須量化納入採購決策的變數。&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;項目&lt;/th&gt;
&lt;th&gt;狀態&lt;/th&gt;
&lt;th&gt;對企業的意義&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S-1 提交&lt;/td&gt;
&lt;td&gt;已確認，但草案未公開&lt;/td&gt;
&lt;td&gt;近期無公開財務細節，需等待正式文件&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft 合作&lt;/td&gt;
&lt;td&gt;公開摩擦&lt;/td&gt;
&lt;td&gt;技術依賴單一雲端需重新評估潛在架構風險&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;中國影響力操作&lt;/td&gt;
&lt;td&gt;已公告&lt;/td&gt;
&lt;td&gt;地緣政治風險進入決策框架&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Anthropic：Claude Fable 5 與 Mythos 5——文學創作與化學推理的雙線推進
&lt;/h2&gt;

&lt;p&gt;Anthropic 本週低調發布 Claude Fable 5 與 Claude Mythos 5&lt;a href="https://news.google.com/rss/articles/CBMiZEFVX3lxTE0tNXJJQXNGZWM1d1VmMVRfOFZlYU9VUFcwdm03RFhDRE5uajJOY19mbUQ5ZVpXYW9IOWhzd3JOS3F2c0xIdFRYVUgtZF9RVVI1VW1KVExGb0RVRDBWdlpNTXN1cTA?oc=5" rel="noopener noreferrer"&gt;Claude Fable 5 and Claude Mythos 5 - Anthropic&lt;/a&gt;，但新聞稿本身資訊密度極低，僅確認版本號與所屬產品線，具體能力基準與應用場景需等實際測試。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;值得追蹤的是「Making Claude a chemist」研究&lt;a href="https://news.google.com/rss/articles/CBMiakFVX3lxTE8wMkgxR3JCakJ0S2pJLUEzQ2gzZjVHdjNjUjBaV2hVTHE4UFYtTVdpdmdUcUtzeldxWXpEXzJRX2dTTzJjSGlpbm1CME9mZkJZa2FWVnpCdU1UeDhYbEprQjVSeVdQakUzMGc?oc=5" rel="noopener noreferrer"&gt;Making Claude a chemist - Anthropic&lt;/a&gt;：&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anthropic 發表了將 Claude 應用於化學研究的技術論文。化學推理需要精確的分子結構理解與反應路徑推導，代表多模態模型在科學領域的落地又往前一步。&lt;strong&gt;但要注意&lt;/strong&gt;：這仍是研究論文階段，不代表有可供企業部署的化學專用 API。從論文到產品，中間還有工程化、可靠性驗證與定價三關。&lt;/p&gt;




&lt;h2&gt;
  
  
  Google：Gemini 滲透 Apple 生態，即時翻譯終於落地
&lt;/h2&gt;

&lt;p&gt;本週 Google 發布三項更新，呈現「生態整合先於模型進步」的策略：&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Gemini 進 Apple 開發者工具&lt;a href="https://news.google.com/rss/articles/CBMirgFBVV95cUxOenZQWUdObUdaaGZCX2NCUFVueGE1QU56em5JOElERTdaeXBpNUxIWk9RYlBubGtvYS16b2ZxbHNlaElvOW1qX1RyRi1NeVlmaTVkT25VRjhmSmVIYlh1b2VfOEJOakVZaTl2NFlNRGVuUE00R2I5Y094TVBiMVhkaDRUT2NwM0ZVMVhrZzBwZVRPQWt1UEtQZ3ZjSC1peXRDWGN3cnhLeFYxTHBVdHc?oc=5" rel="noopener noreferrer"&gt;Bringing the latest Gemini models to Apple developers - blog.google&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
Google 宣布將最新 Gemini 模型帶入 Apple 開發者工具鏈，具體形式與整合深度尚待說明。對 Apple 開發者而言，這代表未來 App 內建 AI 功能可能直接調用 Gemini API，而非透過雲端中介。但整合方式（邊緣推論 vs 雲端呼叫）尚未確認，直接影響隱私敏感場景的適用性。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Gemini 3.5 Live Translate&lt;a href="https://news.google.com/rss/articles/CBMinwFBVV95cUxQYmE4REJXdHJpTzA0SVhCY3NMeTZTdGlGeVMtQUYtMFR4WnNKMWp2ZzM4Zi10UTREdGd3dzJ4X1dKN2hIWVZObkw4ZkdQX2tVdnMwRERfU0g3WG9kcGJNNUlmaDhzVER4VE9QQkJENXI2N0NhcnVCNFlpcEViNloyWG83UGlCdDFFaGt2bV9zLTdSZHBSUEUxb2xzeXE4SnM?oc=5" rel="noopener noreferrer"&gt;Fluid, natural voice translation with Gemini 3.5 Live Translate - blog.google&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
支援即時語音翻譯，目標場景明顯對標 Google Translate 既有功能。這次升級的價值在於 Gemini 的多模態理解能力——翻譯時能結合語音語調與視覺上下文，而非只做字面轉換。&lt;strong&gt;實用性評估&lt;/strong&gt;：對需要跨語言即時溝通的商務場景有直接幫助，但企業若已使用專業翻譯 API，切換成本與流程重構需納入評估。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Gemini Omni&lt;a href="https://news.google.com/rss/articles/CBMiYkFVX3lxTE0wOVZjc2o4dFdxdHljVS0yNWNNNU01NmFWNW82TVI0T1ptb1JxS0wzV3FoMkp6NGtoWDB0MmxiTFNodnUzSGpoOElDTDVlRWRpdDNRRFdzY2pRTWZULWxxRDh3?oc=5" rel="noopener noreferrer"&gt;Introducing Gemini Omni - blog.google&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
強調「全模態」整合能力，將視覺、語音、文字整合為單一輸入架構。&lt;strong&gt;實用性評估&lt;/strong&gt;：Gemini Omni 的具體 API 端點、延遲數據與隱私架構尚未確認，技術長不宜在官方文件發布前將其納入採購評估。&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;更新&lt;/th&gt;
&lt;th&gt;定位&lt;/th&gt;
&lt;th&gt;可用狀態&lt;/th&gt;
&lt;th&gt;企業適用評估&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini → Apple 開發者&lt;/td&gt;
&lt;td&gt;生態整合&lt;/td&gt;
&lt;td&gt;發布中&lt;/td&gt;
&lt;td&gt;待確認整合深度&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Live Translate&lt;/td&gt;
&lt;td&gt;功能升級&lt;/td&gt;
&lt;td&gt;發布中&lt;/td&gt;
&lt;td&gt;商務翻譯場景可評估&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini Omni&lt;/td&gt;
&lt;td&gt;全模態架構&lt;/td&gt;
&lt;td&gt;發布中&lt;/td&gt;
&lt;td&gt;細節不足，暫緩決策&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Visa × OpenAI：支付整合的信號價值
&lt;/h2&gt;

&lt;p&gt;Visa 宣布與 OpenAI 達成合作，目標是讓 ChatGPT 用戶直接透過 Visa 完成支付&lt;a href="https://news.google.com/rss/articles/CBMiywFBVV95cUxQNnhSb05qd01OSEljTlBIejRYSFZ2bDVGRDdJeFpmZnRlMEU3dGYtLTNMUkc5SXpfTmRYQk9FNDRrQUxiNHZIUmlRc0p0LTROLVl0UGVyU0V0ZGFBQ3dQSEtDRUJadUk4bjNMbnYtT1gyQU5neVh1TUt6YnZGS0h4WTVVMFlZUG9MeWw3X2ZRTG90TzVhMGJXZlp3Q3BTUTQ5NnlPdEVCTDk2X0Q4YVNacmV6RktPcW01TnBjd3hJTDJJSnhRUTI5Vi1ORQ?oc=5" rel="noopener noreferrer"&gt;Visa Partners with OpenAI to Power the Next Generation of AI Commerce - Visa - Investor Relations&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxQbEtRbUVBNldBSEl0bTBVeHdGYWUzdi1TQ256RTVnbWhwaEFTYXpQLVdWZ1V6MkpwYXh1TXZydlUxWHpncW5SYlhJaXJNWXJsSXVBUFd1ZGZJWnhBMndDTUhST0dmYUoybGFMdUlWSjM3MGtPM3NkSW9tZ3BsZng1akc3UURSWkNHVGZSQ3dpOUFJZVBKOVktV0w2TTB1TzMxTFYwd2g2Wl9SZw?oc=5" rel="noopener noreferrer"&gt;Visa to Secure Payments for Shoppers on ChatGPT in OpenAI Partnership - WSJ&lt;/a&gt;。WSJ 補充說明這是讓 Visa 用戶在 ChatGPT 內購物時享有安全保障&lt;a href="https://news.google.com/rss/articles/CBMiqgFBVV95cUxQbEtRbUVBNldBSEl0bTBVeHdGYWUzdi1TQ256RTVnbWhwaEFTYXpQLVdWZ1V6MkpwYXh1TXZydlUxWHpncW5SYlhJaXJNWXJsSXVBUFd1ZGZJWnhBMndDTUhST0dmYUoybGFMdUlWSjM3MGtPM3NkSW9tZ3BsZng1akc3UURSWkNHVGZSQ3dpOUFJZVBKOVktV0w2TTB1TzMxTFYwd2g2Wl9SZw?oc=5" rel="noopener noreferrer"&gt;Visa to Secure Payments for Shoppers on ChatGPT in OpenAI Partnership - WSJ&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;這則新聞的技術意涵被低估了。&lt;/strong&gt; AI Agent 時代的核心前提之一，是讓 AI 能完成真實交易動作（不只回答問題，還要能付款、預訂、執行）。Visa 與 OpenAI 的合作，等於是支付基礎設施對 AI Agent 經濟價值的正式背書。具體 API 介面、商家入駐流程與交易失敗處理機制仍是黑盒子——技術長現在該做的是：建立內部盤點，確認自己的 AI 應用中哪些流程明年此時可以接入真實支付。&lt;/p&gt;




&lt;h2&gt;
  
  
  數據點：AI 教育的隨機對照試驗
&lt;/h2&gt;

&lt;p&gt;Google DeepMind 發表了於獅子山共和國進行的 Gemini 引導學習隨機對照試驗結果&lt;a href="https://news.google.com/rss/articles/CBMingFBVV95cUxPSGxaODhWMGdPMm1mQ2phVTB4bkszM2d4S3NzWVBIT2wzdG5YNmpZUmxjdW55cEM2UEFWb0JZdzVLTjRlU3JfVEFuS25icXZUTWJNdmwzeFBuLVZoTURxU2l2UmkxUGY5LWRxUW13c2pKa0xXOFM4WENkb28zZjl0WGJCSWh1VklpQTBHV1hvY3V4YVQ4cXdfMmQxQV9rZw?oc=5" rel="noopener noreferrer"&gt;Gemini’s guided learning: results from a randomized controlled trial in Sierra Leone - Google DeepMind&lt;/a&gt;。這是少數有對照組數據支撐的 AI 教育干預研究，而非廠商自行發布的主觀滿意度調查。&lt;/p&gt;

&lt;p&gt;結果顯示 AI 引導學習在特定科目有正向效果，但研究者也坦承樣本限制與執行環境差異。&lt;strong&gt;對教育科技投資者而言&lt;/strong&gt;：這是「AI 可以改善學習成效」迄今最嚴謹的證據之一，但獅子山的師資與基礎設施條件與已開發國家差異極大，直接推論要謹慎。&lt;/p&gt;




&lt;h2&gt;
  
  
  本週取捨總結
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;值得實際關注（進評估流程）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visa × OpenAI 支付整合的方向確認——盤點內部 AI 流程中可接入支付的環節&lt;/li&gt;
&lt;li&gt;Gemini Live Translate 的商務翻譯場景評估&lt;/li&gt;
&lt;li&gt;Gemini → Apple 開發者整合的具體 API 文件&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;值得觀察但不下結論（等更多細節）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI S-1 公開文件中的技術與財務架構揭露&lt;/li&gt;
&lt;li&gt;Claude Fable 5 / Mythos 5 實際能力測試&lt;/li&gt;
&lt;li&gt;Gemini Omni 全模態架構的具體規格&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;本週可暫時忽略（資訊密度不足）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;「Built to benefit everyone」計畫文件&lt;/li&gt;
&lt;li&gt;獅子山 AI 教育 RCT（方向正確但尚無直接應用）&lt;/li&gt;
&lt;li&gt;大多數廠商新聞稿中的旗艦模型發布公告&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI Weekly — 2026-05-29 to 2026-06-05 | The Gap Between Launch and Landing</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Fri, 05 Jun 2026 00:35:57 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-05-29-to-2026-06-05-the-gap-between-launch-and-landing-506c</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-05-29-to-2026-06-05-the-gap-between-launch-and-landing-506c</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The week delivered another wave of model releases and infrastructure deals. But the more consequential shift is the one playing out in courts, in cloud contracts, and in the gap between what models can do in demo and what developers can reliably ship to production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Codex Crosses the Demo Threshold
&lt;/h2&gt;

&lt;p&gt;OpenAI's Codex—previously a research artifact—is now being positioned as a production coding tool&lt;a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFBJbzdPX3duRDNSb3Z4bHJ1SlcwRzlmT3UyMkl5Ny12QmNoVDZCNjZZeVBwTVFrUk1iVmhXY181dXB1VmJ0SlcyalE4WWtEZVlRbHVaVDN4a0dqb3RYYm9z?oc=5" rel="noopener noreferrer"&gt;Codex is becoming a productivity tool for everyone - OpenAI&lt;/a&gt;. A research demo that solves LeetCode problems is not the same as a tool engineers integrate into CI pipelines, code review workflows, or pair-programming sessions at scale. OpenAI's productivity claims come from internal benchmarks. Independent validation from engineering teams running Codex against their own codebases has not been published.&lt;/p&gt;

&lt;p&gt;Codex is not a drop-in replacement for existing tooling. The integration challenges—latency, context window limits, error rates on unfamiliar codebases, and cost at scale—remain unresolved. Teams evaluating it should treat it as an early-stage product with a compelling demo.&lt;/p&gt;

&lt;p&gt;OpenAI also announced GPT-Rosalind, a new model with unspecified capabilities&lt;a href="https://news.google.com/rss/articles/CBMiekFVX3lxTE5aaG00WThPOVR1NGNqZUVQMlBGdExXbnEzN3p2NjBkU3JrLWJlNWlIQ2dLcGpMU1JLQy1iZ1ZxbEpkVjg5TDh2cHRzX1ZZOTg4N194eVhkdzU3UWNURmhOTG5OWTNDRTg3M1JhbTJ6WTNZRnFGSm9JZ0NB?oc=5" rel="noopener noreferrer"&gt;Introducing new capabilities to GPT-Rosalind - OpenAI&lt;/a&gt;. No benchmarks, no architecture details, no pricing. This is a label, not a product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Florida Sues OpenAI—A Regulatory Rupture
&lt;/h2&gt;

&lt;p&gt;Florida's lawsuit against OpenAI and Sam Altman, alleging safety lapses, is the first major state-level legal action against a frontier AI lab[來源 #5, #6]. The complaint's specific claims—not the headline—will determine its significance. If Florida's case rests on concrete safety failures with traceable harms, it establishes precedent. If it amounts to insufficient disclosure of model limitations, it becomes a regulatory nuisance rather than a legal landmark.&lt;/p&gt;

&lt;p&gt;What "structured legal exposure" means in practice: labs now face formal discovery obligations, deposition requirements, and the possibility of court-ordered safety audits—enforcement mechanisms that academic criticism and PR disputes cannot replicate. The lawsuit answers a question that critics have raised for two years: can a government entity actually compel a frontier lab to respond in court rather than through a blog post? Florida says yes, and the answer matters regardless of how the case ends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure: The Cloud Wars Are a Developer Problem
&lt;/h2&gt;

&lt;p&gt;Three infrastructure stories converged this week, and they share a common thread: the distribution of AI capabilities is fragmenting away from exclusive Microsoft-OpenAI alignment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI frontier models land on AWS Bedrock&lt;/strong&gt;[來源 #1, #2]. Developers can now access GPT-5.5, GPT-5.4, and Codex through AWS infrastructure—the dominant cloud platform for enterprise workloads. This collapses the distance between OpenAI's API and the deployment environment where most teams already operate. If the integration is stable and priced competitively, it accelerates the path from experimentation to production for teams in the AWS ecosystem. The exclusivity window is closing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NVIDIA announced a new AI chip for personal computers&lt;/strong&gt;[來源 #7, #8]. The PC AI chip story is real but still in early hardware. Software tooling, driver support, and application-level AI integration will lag the announcement by months. This matters for 2027–2028 decisions, not 2026 ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Microsoft and OpenAI's relationship continues to fracture&lt;/strong&gt;&lt;a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxOVXhfV25Tdk1IZnNVX3gxYmZiMzFpN0ttdHcxY3NJNDN3ck9GV2dLb29GeDNCX0RCY2c0cG5uLUY2Z0E3ME5MX0Z0a2RNMU9MbFA3c1VaOFY1bTh3eWUydk40bmkxR3oyQ0hnOWZ1akdHa0lQVlBpa1Zfb1RpVExHTWF1aXZFckQzLTBCVzVHb2E2ODZjT1pxU1o2ZFdKMk1ZeDRVejdpZ3hTTnFKaE50SGh3?oc=5" rel="noopener noreferrer"&gt;Microsoft and OpenAI's relationship continues to crumble - Yahoo Finance&lt;/a&gt;. OpenAI is accelerating distribution through non-Microsoft channels. The structural tension—a company that owns Azure competing with a company that sells models competing with Azure AI services—was always latent. Teams relying on the OpenAI-Microsoft exclusive relationship should have contingency plans. The Bedrock move is evidence that contingency planning is now operational, not theoretical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Science and Safety: Real Results, Uncertain Context
&lt;/h2&gt;

&lt;p&gt;OpenAI's model solved a famous math problem that stumped humans for 80 years&lt;a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxORTJ1S1lPZ3hXZjlnZHhBWXgyZGJyVDQ3ZmxRYkdmSVo0RmVaeXp1TVZ1eldVajZ4ejcyUWxnTFF4NUExbHNpT3JLSkxraXpLRkVZYzFOV09KdVZjLWVzdzFUZ25mMk1GX2txeF83RHhQV0xpcmlsbFp3ZG1vNTFkeTNNN2tpaWxlVnpsN1ZR?oc=5" rel="noopener noreferrer"&gt;An OpenAI model solved a famous math problem that stumped humans for 80 years - Ars Technica&lt;/a&gt;. The claim deserves scrutiny: the problem's identity, the verification process, and whether the solution generalizes or represents a narrow exploit are not detailed in the available reporting. Mathematical problem-solving benchmarks have a history of models finding unexpected shortcuts that don't reflect general reasoning. Treat this as a data point pending disclosure.&lt;/p&gt;

&lt;p&gt;OpenAI also launched a biodefense program&lt;a href="https://news.google.com/rss/articles/CBMiakFVX3lxTFB6MUtWMkc0cHp6eWNSUThSdEdGbzhFWGZZVlZqZXVDXzVEcjJ4aktBeVpPbDNXWWRfYXo5Y2JJeUdCbDVrWTU5WGhNRzhUTlVDQmNBaW5meWtzekFSRUJDbUhkX0tSUHU2Zmc?oc=5" rel="noopener noreferrer"&gt;Exclusive: OpenAI launches biodefense program - Axios&lt;/a&gt;. The biodefense applications—protein structure prediction, literature synthesis, failure mode analysis—are genuinely useful and represent an area where AI capabilities map to high-value, low-risk deployment. Unlike general-purpose assistants, domain-specialized tools in biological research face lower misuse risk and clearer evaluation criteria.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Builds Out the Claude Ecosystem
&lt;/h2&gt;

&lt;p&gt;Anthropic expanded its Claude Partner Network with a Services Track and Partner Hub&lt;a href="https://news.google.com/rss/articles/CBMiaEFVX3lxTE5OSHlhS1hiNk9fQmJqWTJpQVpvRHFqNFNnSmVQemlpX0xBZWkxS0tGNVFxelhxTG1jR0NZRG9aNERIN3liMXZNdV9jRkJqRUw4b0pIUDdPX0psSlRtbFhNV21LeVJ4MjRP?oc=5" rel="noopener noreferrer"&gt;Introducing the Services Track and Partner Hub of the Claude Partner Network - Anthropic&lt;/a&gt;, and extended Mythos—its security hardening framework—to 150 more organizations including critical infrastructure operators&lt;a href="https://news.google.com/rss/articles/CBMinwFBVV95cUxQQldubzkzemNwSVNTRnhrZW1XbEpLbUktaVRqY3pONnJtclMwSmxUYW0xYUgydFJXalhxUHVTNWtkUWN1cjlUYmpJMGRsVVA1Xzh1dm45V0ZJaXRrdTdzOHBac1Q0eDZWQ3dTeUhuRmdRNlFxRkRXYU13X3lwUzZPMVFOSUlQTGFqNG1UZkRJRGtTTDJiRHZnYTdfTkZuLXc?oc=5" rel="noopener noreferrer"&gt;Anthropic shares Mythos with 150 more organizations, including critical infrastructure operators - Cybersecurity Dive&lt;/a&gt;. Security hardening for AI systems in power grids, water treatment, and communications is an actual deployment scenario with clear failure modes. This is where AI safety work translates into verifiable outcomes, not press releases.&lt;/p&gt;

&lt;p&gt;The Partner Network expansion is a secondary story: service delivery quality across a growing partner ecosystem will be harder to verify than the security work, which has defined scope and operator accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Week in Charts
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Codex on AWS&lt;/td&gt;
&lt;td&gt;Distribution&lt;/td&gt;
&lt;td&gt;Available now; integration quality unverified&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 12B&lt;/td&gt;
&lt;td&gt;Model release&lt;/td&gt;
&lt;td&gt;Available; benchmark comparisons needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Florida sues OpenAI&lt;/td&gt;
&lt;td&gt;Regulatory&lt;/td&gt;
&lt;td&gt;Pre-trial; outcome uncertain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5/5.4 on Bedrock&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Available; pricing and SLA unconfirmed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA PC AI chip&lt;/td&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;Announced; shipping timeline unclear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Math problem solved by LLM&lt;/td&gt;
&lt;td&gt;Research&lt;/td&gt;
&lt;td&gt;Claimed; specific problem not disclosed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Partner Network expansion&lt;/td&gt;
&lt;td&gt;Ecosystem&lt;/td&gt;
&lt;td&gt;Available; partner quality variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft/OpenAI friction&lt;/td&gt;
&lt;td&gt;Business&lt;/td&gt;
&lt;td&gt;Ongoing; no immediate user impact&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  One Thing to Watch
&lt;/h2&gt;

&lt;p&gt;OpenAI on Bedrock closes the gap between frontier model access and enterprise deployment infrastructure. The exclusivity window with Microsoft is narrowing. If the AWS integration holds under production load at reasonable pricing, it removes the last structural excuse for teams that have been running experiments but not committing to AI-assisted workflows. The models are not the bottleneck anymore. The integration stack is.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI 週報 — 2026-05-29 to 2026-06-05 | OpenAI 前沿模型登陸 AWS：基礎模型通路戰開打</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 04 Jun 2026 23:02:52 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-05-29-to-2026-06-05-openai-qian-yan-mo-xing-deng-lu-awsji-chu-mo-xing-tong-lu-zhan-kai-da-23ji</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-05-29-to-2026-06-05-openai-qian-yan-mo-xing-deng-lu-awsji-chu-mo-xing-tong-lu-zhan-kai-da-23ji</guid>
      <description>&lt;p&gt;OpenAI 的前沿模型正式登陸 AWS Bedrock。GPT-5.5 與 GPT-5.4 現在可直接透過 Amazon 的基礎設施呼叫，繞過 OpenAI 自己的 API 與計費流程&lt;a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxPVmhkdlNPYTFmbHBacG9ja1ZFU2F2YUsxTFd6cFdCdy1PVzh5NmdUdGtTSDhNV3I0WXMzc0NlR1Y2YWlyVjZEeFB3ZU1WSU5KZ0FxUjFuOFNzZVVxSzJFa2VkdmtfQ3ZXajhnUlZkSHV5VnR5dTN4ZTZEMzZyRXVTZlJHZ3IzVzFnYWRB?oc=5" rel="noopener noreferrer"&gt;OpenAI frontier models and Codex are now available on AWS - OpenAI&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMirAFBVV95cUxPeFlUTGtXQUVXQ1BtUXFWNWFwN2RtUDRqMUpXUWRONGQ4Qm1ycGlvRTVfY3RpZXYtSGVvdEwxckdTeUhvRktUelliRTRvNC13akVJczh5WDN0WC0zVmhaM3ctNUV5cjNlYURnZVpKUklNUk5FRzRPcFl0Qm8yckFBNjBZVXA2SWRzXzlLTVg1R29hZjRGN19fQ0V1SXpnSFkyTEMzdlBOVVJXM2pa?oc=5" rel="noopener noreferrer"&gt;Get started with OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock - Amazon Web Services (AWS)&lt;/a&gt;。這是基礎模型通路競爭最具實質意義的一步。&lt;/p&gt;

&lt;h2&gt;
  
  
  雲端通路戰：不再只是 API call
&lt;/h2&gt;

&lt;p&gt;過去幾年，企業要用 GPT-5，邏輯很簡單：拿 OpenAI API key，在自己的系統裡串接，給 OpenAI 付費。這個模式對新創公司夠用，但對大型企業是採購與合規的雙重障礙——企業有既定的雲端供應商、既有預算架構、既有的資安審查流程，繞過這些的摩擦遠高於定價差異。&lt;/p&gt;

&lt;p&gt;現在遊戲規則變了。GPT-5.5 進入 AWS Bedrock，企業可以直接用內部既有的雲端基礎設施訂閱 frontier model，計費打在同一張帳單上，合規審查走同一套框架。Anthropic 與 Google 走在同一條路上，只是 GCP 多了 Google Workplace 的深度整合——Gmail、Meet、Drive、Sheets 全部內建，這讓 Anthropic 的模型在 Google 生態系裡幾乎是預設選項。&lt;/p&gt;

&lt;p&gt;這不是單一產品發布，是 distribution 策略的質變。當通路決定採購路徑，模型能力本身就不再是差異化因子——企業不會因為 GPT-5.5 比 Claude 3.7 強 3% 就繞過企業雲端架構。&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex 的質變：從程式工具到通用助理
&lt;/h2&gt;

&lt;p&gt;同一週，OpenAI 宣布 Codex 從純程式碼任務擴展為通用工作流程助理&lt;a href="https://news.google.com/rss/articles/CBMibEFVX3lxTFB6RzVPbFNDZEpQQWYwa3MyOTJ1ZTFMemc2MFJXUl9MZXNiTmVIMjZaSVN6VkwxNkFGekF3UEVHc3d4SS13SEFsbURCMGgtcVFzZ0xZWlRYQmx4b0Qwdnk1cFEwNklFc3dKbzhlUQ?oc=5" rel="noopener noreferrer"&gt;Codex for every role, tool, and workflow - OpenAI&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFBJbzdPX3duRDNSb3Z4bHJ1SlcwRzlmT3UyMkl5Ny12QmNoVDZCNjZZeVBwTVFrUk1iVmhXY181dXB1VmJ0SlcyalE4WWtEZVlRbHVaVDN4a0dqb3RYYm9z?oc=5" rel="noopener noreferrer"&gt;Codex is becoming a productivity tool for everyone - OpenAI&lt;/a&gt;。目標從「幫工程師寫 code」變成「幫任何職業處理重複性任務」——這是從工具到助理的質變。&lt;/p&gt;

&lt;p&gt;如果通路戰是本週最重要的商業訊號，Codex 的擴展就是最重要的能力訊號。當同一個模型家族可以覆蓋從 code review 到行政流程自動化的全譜，企業就不需要為每個工作流程買一個 specialized 工具。整合成本下降，但模型提供商的談判籌碼上升。&lt;/p&gt;

&lt;p&gt;不過這裡有一個重要的「但是」：Codex 展示的功能在 demo 環境裡看起來很流暞，但企業整合需要與內部系統（CRM、ERP、程式碼庫）深度對接，需要企業級資安審查與資料隔離，需要可靠的延遲 SLA。這三件事目前多數仍待驗證——不是發布了就能用在生產環境。&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4：Google 的開放模型策略
&lt;/h2&gt;

&lt;p&gt;Google 發布 Gemma 4 12B，這是一個無編碼器的多模態模型，支援文字與圖片輸入，強調在消費級硬體上的可用性&lt;a href="https://news.google.com/rss/articles/CBMilAFBVV95cUxQcTdUNDVqeHpOQ3AtSDdyeE1oR2VVXzA2ZFNOTi1mZWh6ek9ZWkhjY19hZU1la2tBalJzcjVsVkk2RENHLWo5c2VDai13QXlZaGs2TXpfYWpkWTBKZnBNbnItYy0yUGpxSzRhQmpUT3NhSHhHOWZGUERHVjZWaDF6am0zUGJLekYtemp2SmhQODU4dDg1?oc=5" rel="noopener noreferrer"&gt;Introducing Gemma 4 12B: a unified, encoder-free multimodal model - blog.google&lt;/a&gt;。同一週 Google 披露內部用 Gemini 建構 Google I/O 2026 的內容&lt;a href="https://news.google.com/rss/articles/CBMiekFVX3lxTFBUS3J6eW1oSDVVOC13Tk5ZQ3dnTkUzekhXZFRXS1otSUcwYVA0OVdFaFJCYzZZb0E4MjRIS3pyaVVZUE1vdzc3MTVqUmNza0s2NDBDeFVNZnZoOVhBVndaajVBZFlTTU4ydWE0YWJsVmNIaTlVTTBFR2Vn?oc=5" rel="noopener noreferrer"&gt;How we used Gemini to build Google I/O 2026 - blog.google&lt;/a&gt;——第一次公開承認用自家模型支撐大型產品發布。&lt;/p&gt;

&lt;p&gt;Gemma 系列的定位很清楚：輕量、開放、企業內部部署友好。12B 參數在本地硬體上的延遲可接受，企業可以繞過 API 延遲與隱私疑慮。這個策略與 Llama、Qwen 一致——用開源社群稀釋封閉模型的通路優勢。對企業資安團隊而言，在自己的資料中心跑一個 12B 模型，資料不上雲，審計路徑簡單得多。&lt;/p&gt;

&lt;h2&gt;
  
  
  基礎模型的工作記憶能力
&lt;/h2&gt;

&lt;p&gt;本週有兩個值得注意的能力發布。OpenAI 公布「Dreaming」功能，ChatGPT 在背景持續處理對話上下文，改善多輪對話的連貫性&lt;a href="https://news.google.com/rss/articles/CBMiXkFVX3lxTFBOd2FyMkVXZ01LRFZHMHZ2Wk1HeTlSOEMtNGd0azhKMi1tT0JZWkZVaHR3UUdFS0NMUzZpNEtIcExsaVo3Y1BON3ExVWR6a3F3bkp6Y1FvV2s0bGZCWFE?oc=5" rel="noopener noreferrer"&gt;Dreaming: Better memory for a more helpful ChatGPT - OpenAI&lt;/a&gt;。同日 OpenAI 發布 GPT-Rosalind，強調多步推理與科學研究場景的整合&lt;a href="https://news.google.com/rss/articles/CBMiekFVX3lxTE5aaG00WThPOVR1NGNqZUVQMlBGdExXbnEzN3p2NjBkU3JrLWJlNWlIQ2dLcGpMU1JLQy1iZ1ZxbEpkVjg5TDh2cHRzX1ZZOTg4N194eVhkdzU3UWNURmhOTG5OWTNDRTg3M1JhbTJ6WTNZRnFGSm9JZ0NB?oc=5" rel="noopener noreferrer"&gt;Introducing new capabilities to GPT-Rosalind - OpenAI&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;Ars Technica 報導一個 OpenAI 模型解決了困擾人類數學家 80 年的問題&lt;a href="https://news.google.com/rss/articles/CBMijgFBVV95cUxORTJ1S1lPZ3hXZjlnZHhBWXgyZGJyVDQ3ZmxRYkdmSVo0RmVaeXp1TVZ1eldVajZ4ejcyUWxnTFF4NUExbHNpT3JLSkxraXpLRkVZYzFOV09KdVZjLWVzdzFUZ25mMk1GX2txeF83RHhQV0xpcmlsbFp3ZG1vNTFkeTNNN2tpaWxlVnpsN1ZR?oc=5" rel="noopener noreferrer"&gt;An OpenAI model solved a famous math problem that stumped humans for 80 years - Ars Technica&lt;/a&gt;——這個結果需要獨立驗證，但方向明確：模型正在從知識檢索走向知識建構。&lt;/p&gt;

&lt;p&gt;這些能力目前多數仍在實驗階段。企業在評估時需要區分「demo 很驚艷」與「實際部署可靠」之間的差距。&lt;/p&gt;

&lt;h2&gt;
  
  
  NVIDIA：基礎建設比模型更深
&lt;/h2&gt;

&lt;p&gt;本週 NVIDIA 宣布與 Microsoft 共同重構 Windows PC 的 AI 架構，瞄準個人 AI 助理的硬體層&lt;a href="https://news.google.com/rss/articles/CBMihwFBVV95cUxOQ2syWGhmZU10YXdaWk9vWlpoaWlsOXp3SGg3TXZFdEFyQm9waHZfOUFIOVAtenNFM1RJTk84cXFEOU1ZNzNMZ0VVNm5vX1pPSE0zMlU4bTZRcTNtNmNTZERHR1hCbVNFM2tFNC01Z2ZTeF9qVmppa2p0clNWRUV4b1ZWck9wMUE?oc=5" rel="noopener noreferrer"&gt;NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI - NVIDIA Newsroom&lt;/a&gt;。同日宣布與 TSMC 合作將 AI 引入半導體製程，模型輔助晶片設計與製造&lt;a href="https://news.google.com/rss/articles/CBMiuwFBVV95cUxNNWhpUkRjY2Z5cE85UmpLWnpPMWVsS3Q3X3VXUE1EOEQ3aTROZkhMM3g5YV9ldWdhVGxiVjFYN0YxTWJxcGppdGpJaXVjeGhaMzg5aFlQWXB4azZGekltYnBvMkJPUXpzdWpES0dhYVBkaTkzVHA2c3lJODZNbHRtSEF1V09GWUtoaXhXT1drUlg4RnlGVzZnQ2FydzFUSzdVMlFYWDg2bGFEaXBMcG9LYndnMFBvTWliZW1R?oc=5" rel="noopener noreferrer"&gt;NVIDIA and TSMC Bring AI Into Fabs to Advance Semiconductor Design and Manufacturing - NVIDIA Newsroom&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;兩條新聞說明同一件事：無論哪個基礎模型最終勝出，NVIDIA 都受益。PC AI 意味著更多本地推論需求，離不開 NVIDIA GPU；半導體 AI 意味著晶片廠需要更強的 AI 設計工具，依然離不開 CUDA 生態。晶片管制短期內影響 NVIDIA 中國營收，但長期來說加速了中國自研晶片的進程——對 NVIDIA 在其他市場的定價權影響有限。&lt;/p&gt;

&lt;h2&gt;
  
  
  地緣政治：晶片管制失敗與中國加速自主
&lt;/h2&gt;

&lt;p&gt;本週有兩條新聞衝擊晶片出口管制的敘事。South China Morning Post 報導美國出口管制正在迫使中國重新設計整條 AI 晶片供應鏈，從 IC 設計到先進封裝全面本土化&lt;a href="https://news.google.com/rss/articles/CBMi0AFBVV95cUxPOGVMZklfbThua3lUeVJqUXdMbnA5OGRDaDFnV0VTOENvY1h5Wk9fZnpYMWRrMG1Ycmk2V3V6d25icWFZODhnRDJPdmR2RzRNUkFLUXIyU0pLeUN2V2JnbzM5cWNZcTJDWmxFNjhsLXg5a25GVWRlN0FfclR2bFdSRGNoWU5FbHQ0eURBelZLMHMwYk05a2ZXemFVRTM2YTVMV3hOYmFhQ1AydEZkQ3czQ0FCYk5lSjNHazZrN09XQUhqYzVHY18za25yNlY2MTgw0gHQAUFVX3lxTFB4eFd2TU8ySjdEVTVFbXduWWRWM2xDbGROZFZpUmZLWnZLa2M3OTktRFhnelhaRXpiUXpYaXdrRGZlUERMOVNhbmtmb1lWcU42bDRQdmlGNUotWFRvbjZleDlsaEhZSkJ4MkYyM0E1NC1zOG5zYnJuMmdtQ21EMmlQZ3lVdHlrN2Jvd2wxVlV5YWRKU19GZ0pRRGlOckVRVjBCeDB5RTFUVXZIS1Z4YkJiclNVWkNmZDZ4SE40UnEwR0o2UXFiWnBBcy1YdmtPUlE?oc=5" rel="noopener noreferrer"&gt;How US export curbs are forcing China to redesign its AI chip industry - South China Morning Post&lt;/a&gt;。Foundation for Defense of Democracies 的分析指出美國商務部已坦承現有管制執行失敗&lt;a href="https://news.google.com/rss/articles/CBMitAFBVV95cUxOUG1tSjQtaTZic1hubHhJRUwzMHZuMkRCMzZFWUR1bWczNld2dkNQT0g1cG4wSFFZalJ2U1lyTVN3OWkyQVlOSGlMSjcwbk1uUmJBUUdwb1FBYnlOQjUyWHhKNi1PWnhPYjRUSWFRV3I3bDlNel9OeG9KVWVLdkFTQ2Vqb1NuZTFqRXlNSlA2b3U2SjZOYm0tYjd6RVdPZXBKV2llandEZ05oUmR1RHpYX3drTkY?oc=5" rel="noopener noreferrer"&gt;Commerce Department Admits Failure To Enforce AI Export Controls on China - Foundation for Defense of Democracies&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;Carnegie Endowment 的報告嘗試為美國政策辯護，認為晶片管制不會阻止中國 AI 發展&lt;a href="https://news.google.com/rss/articles/CBMihgFBVV95cUxQdS1IV0NPRi1OZFJLdlBpc0RDT1FlUUFRWVhfUzkwWng5NTByTW5NRnBoVEJNRXZLVTBQdWxBSUZQcXlaWFN6ekpfU194RFA4VWtRbXp2aldqd2xkOUJkMHpCeUNBOU5JMEtSQmpKVHROVGdxckZnbFdLMjh3SVREd3BzSVJrdw?oc=5" rel="noopener noreferrer"&gt;Trump’s AI Order Won’t Stymie U.S. Competition with China - Carnegie Endowment for International Peace&lt;/a&gt;。但 War on the Rocks 的分析更尖銳：中國本土 AI 市場的競爭慘烈程度被嚴重低估——數千個模型在中國國內市場激烈競爭，這會訓練出真正能在邊緣場景部署的團隊，不是只會發政策報告&lt;a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxNbjZPcU0tbHNhdGRTM1lNTzg1WG1zenJPUDFBdWtMZGN3MHZiWjduMzlTRzFEZGozSXRTR3hZSVVOalZUak1QYjYwU25OOWxHVTJWdW5vRzhvLVNuTVkxUnNYemNWdHhRNFVVTExZVmh4TGJ0WS1TbWxvYl8wWS1XYlY5Y0tBcjlVUUtuaUV5dXd5VmRFbDBWQTVIVnFCc2cyRVE?oc=5" rel="noopener noreferrer"&gt;Forged in a Knife Fight: China’s Brutal Domestic AI Competition - War on the Rocks&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;對於任何有中國市場或供應鏈曝險的企業而言，這不是地緣政治背景噪音，而是直接影響定價與供應鏈的商業變數。&lt;/p&gt;

&lt;h2&gt;
  
  
  法規進展：Florida 起訴 OpenAI
&lt;/h2&gt;

&lt;p&gt;Florida 州政府對 OpenAI 及 Sam Altman 提起訴訟，指控其誤導 ChatGPT 的安全性、未充分警告風險，並將產品包裝成適合兒童使用的安全工具&lt;a href="https://news.google.com/rss/articles/CBMiigFBVV95cUxPMnl6aTZnS3U5R25vTDExcXNKX18ySTBlM1otaWtQeE52NVctM243a1pmRjFuVEpfTVozZVFhcmpfVU5aQnZLdklXaG92SXlZR2JpU2lOdGdmc1R2SzI3SURCTjRraHN1QUpzYjVka0RTM0ZEUDlvUEE5M2FCdTFQc0Q1SnZRMm51d3c?oc=5" rel="noopener noreferrer"&gt;Florida sues OpenAI and Sam Altman over alleged safety lapses - NPR&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiiAFBVV95cUxQT1ViQml3dG5wdEtPRTlqdWVURG90OHZmekdnc1JmekRpUUlYMmVjSWNjWFppNTROQm1nLWpmOFlZN0cxeFpxTzNlOHFSRTB0R1RzVm1mU0RQYjRIaTdTdWsyUTJ2ME84VzJRbUpLZUczOFVNaUtzcFk3d2J1RVpldFFJbFFqcHlD?oc=5" rel="noopener noreferrer"&gt;Florida sues OpenAI and Sam Altman over AI risks - Politico&lt;/a&gt;。這不是依據新的聯邦 AI 草案提告；國會草案尚未成為法律。Florida 採取的是州政府既有的消費者保護、產品安全與公共危害等法律路徑。&lt;/p&gt;

&lt;p&gt;這件事的重要性在於：AI 公司的法律風險不必等到國會通過專門 AI 法才會發生。Florida 是第一個對 OpenAI 提告的州政府，但不是第一個對 AI 聊天機器人公司採取法律行動的州；Kentucky 已在 2026 年 1 月起訴 Character.AI，Pennsylvania 也在 2026 年 5 月針對 Character.AI 的醫療建議與醫師身分誤導問題提告&lt;a href="https://www.kentucky.gov/Pages/Activity-stream.aspx?n=AttorneyGeneral&amp;amp;prId=1857" rel="noopener noreferrer"&gt;AG Coleman Sues AI Chatbot Company for Preying on Children - Kentucky Attorney General&lt;/a&gt;&lt;a href="https://www.pa.gov/governor/newsroom/2026-press-releases/shapiro-administration-sues-character-ai-over-fake-medical-claim" rel="noopener noreferrer"&gt;Shapiro Administration Sues Character.AI Over Fake Medical Claims - Commonwealth of Pennsylvania&lt;/a&gt;。企業採購 AI 工具時，合規框架不能只依賴廠商保證——法務團隊需要同時追蹤既有州法如何被用來執法，以及聯邦草案是否會改變州級監管空間。&lt;/p&gt;




&lt;h2&gt;
  
  
  一句話總結
&lt;/h2&gt;

&lt;p&gt;基礎模型的可用階段已到來，但可商用的瓶頸正在從模型能力轉移到通路、晶片供應鏈與法規三條主線。雲端通路戰只是開始，誰拿下企業 distribution誰就有定價權。&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>你的 AI agent 不笨，是你餵的 context 不行</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Fri, 29 May 2026 07:04:49 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ni-de-ai-agent-bu-ben-shi-ni-wei-de-context-bu-xing-5g42</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ni-de-ai-agent-bu-ben-shi-ni-wei-de-context-bu-xing-5g42</guid>
      <description>&lt;p&gt;你叫 agent 加一個功能。它很有自信地寫出一段乾淨的程式碼——用的卻是你半年前就拔掉的套件版本、早就放棄的目錄結構，還有一個這個 repo 從來沒用過的 auth 寫法。它能編譯，但每一個重要的地方都是錯的。&lt;/p&gt;

&lt;p&gt;第一個反應，多半是怪模型：「它又在 hallucination 了。」也許吧。但更常見的真相是——以它手上握有的資訊，它做的其實完全合理。問題是，它手上握有的，只是你分心時隨手丟進對話框的一段模糊描述。&lt;/p&gt;

&lt;p&gt;所以這篇想講一個有點刺耳的結論：&lt;strong&gt;多數時候，不是你的 agent 笨，是你餵的 context 不行。&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  為什麼是現在
&lt;/h2&gt;

&lt;p&gt;過去很長一段時間，寫軟體最難的部分是「把程式碼寫出來」。這件事正在悄悄地不再成立。現在 agent 產出可用程式碼的速度，常常比我們 review 的速度還快。瓶頸往上游移動了——移到「把意圖講清楚」這件事上：你到底要什麼、有哪些限制、在「這個」codebase 裡怎樣才叫做好。&lt;/p&gt;

&lt;p&gt;而我們在這件事上做得很差。我們把餵給 agent 的指令、規則、專案知識當成用過即丟的聊天內容：貼一段 prompt、拿到結果、然後那段 prompt 就永遠消失了。我們從來不會這樣對待自己的原始碼——原始碼我們會版控、會 review、會測試。Patrick Debois（當年不小心造出「DevOps」這個詞的人）講的正是這件事：context 就是新的 code，值得用同樣的工程紀律去對待。他把這套還在成形中的方法稱為 &lt;strong&gt;Context Development Lifecycle&lt;/strong&gt;——像對待軟體一樣，去 generate、evaluate、distribute，並在 production 裡持續 observe。&lt;/p&gt;

&lt;p&gt;我覺得這個框架是真的有用。但它也還很早期——比較像一個方向，而不是一條鋪好的路。所以接下來我跳過理論，直接講你明天就能動手做的部分。&lt;/p&gt;

&lt;h2&gt;
  
  
  一、把知識從腦袋（和對話）裡搬進檔案
&lt;/h2&gt;

&lt;p&gt;槓桿最大的一步：別再把專案知識留在自己腦袋裡、留在聊天紀錄裡，把它寫進 agent 會自動讀取的版控檔案。&lt;/p&gt;

&lt;p&gt;多數 agent 工具都支援某種專案指令檔——&lt;code&gt;CLAUDE.md&lt;/code&gt;、&lt;code&gt;agent.md&lt;/code&gt;、&lt;code&gt;.cursorrules&lt;/code&gt;，名字不重要。把它當成一個真正的產物來經營：commit 它、在 PR 裡 review 它，讓它慢慢累積那些「新同事第一天上工會需要知道」的硬知識：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# agent.md&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Node 20，TypeScript strict mode。不准用 &lt;span class="sb"&gt;`any`&lt;/span&gt;。
&lt;span class="p"&gt;-&lt;/span&gt; Postgres 走 Drizzle。但我們「不」用 ORM 內建的 migration 工具——
  migration 都放在 /migrations，用 npm run db:migrate 跑。

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; API handler 一律回傳 Result&lt;span class="nt"&gt;&amp;lt;T&amp;gt;&lt;/span&gt;，不准跨邊界 throw。
&lt;span class="p"&gt;-&lt;/span&gt; 測試用 Vitest，跟原始碼放一起，命名 &lt;span class="err"&gt;*&lt;/span&gt;.test.ts。

&lt;span class="gu"&gt;## Don't&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; 不要沒問過就加新套件。
&lt;span class="p"&gt;-&lt;/span&gt; 不要碰 /legacy——那塊已凍結，正在被刪掉。
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;注意，這些都不是什麼厲害的 prompt 技巧。它們是「事實」——就是你會對一個真人新人講的那些話。好處在於：你只寫一次，往後每一個 session 一開場就是「已經知道」，而不是「重新猜一次」。&lt;/p&gt;

&lt;h2&gt;
  
  
  二、把規則分層，一層只做一件事
&lt;/h2&gt;

&lt;p&gt;別把所有東西塞進同一個巨大的檔案。像拆 config 一樣，按「適用範圍」把 context 拆開。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;全域規則&lt;/strong&gt;（你做任何事都適用）：你個人的偏好。「講清楚取捨，不要只會附和我。」「能用標準函式庫就不要加新套件。」這些跟著「你」走，跨專案都成立。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;專案規則&lt;/strong&gt;（只限這個 repo）：技術棧、慣例、地雷。這些跟著「程式碼」走。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;把兩者分開很重要，因為它們變動的速度與理由不同。你的個人風格相對穩定；專案的架構則會一直變。一旦混在一起，每次某個 repo 做了奇怪的事，你就得去動到你那份「universal 偏好」——然後那個怪癖就會悄悄滲進你「所有」其他專案。一個檔案，一件事。&lt;/p&gt;

&lt;h2&gt;
  
  
  三、餵事實，不要餵感覺
&lt;/h2&gt;

&lt;p&gt;當你給 agent 的是「可以查證的東西」而不是「請你回想一下」，hallucination 會明顯下降。&lt;/p&gt;

&lt;p&gt;「用最新版的 React Router」這種講法，等於請模型去把它訓練時看過的所有版本平均一下。換成「我們用 React Router 7，只走 data router，這是我們在用的三種 pattern：[貼上]」，你給的是 ground truth。來源越具體、越「當下」，它能自由發揮（瞎掰）的空間就越小。&lt;/p&gt;

&lt;p&gt;具體來說：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;版本講死。寫「React 19」，不要只寫「React」。&lt;/li&gt;
&lt;li&gt;任何變動快的東西，直接貼上真正的 API 或文件片段，別賭它記得。&lt;/li&gt;
&lt;li&gt;指向真實檔案：「照著 &lt;code&gt;src/handlers/users.ts&lt;/code&gt; 的 pattern 寫」勝過用文字描述那個 pattern。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;一個可查證的來源，永遠贏過一段很有自信的記憶。&lt;/p&gt;

&lt;h2&gt;
  
  
  四、把 context 當成有限資源
&lt;/h2&gt;

&lt;p&gt;這一點幾乎每個人都會踩雷。context window 不是無限的，而且——更關鍵的是——「越大不等於越好」。把整個 codebase 全塞進去，不會讓 agent 更聰明；過了某個點，反而更糟：真正相關的訊號被淹沒、模型抓不到重點，輸出品質就這樣悄悄地往下掉。&lt;/p&gt;

&lt;p&gt;留意這些徵兆：回答開始偏離你的慣例、反覆問你早就講過的事、很有自信地改錯檔案。這通常不是模型變笨了——是 context 變雜了。&lt;/p&gt;

&lt;p&gt;實際該做的事：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;察覺到退化。&lt;/strong&gt; 一個長 session 開始產出變差，那是訊號，不是運氣不好。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;compact 後重開。&lt;/strong&gt; 把真正重要的東西——做過的決定、目前的狀態——濃縮進一個乾淨的新 session。多數工具都有 compact 的機制，刻意去用它，而不是讓一個 session 漫無止境地拖上好幾個小時。&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;不要預先塞。&lt;/strong&gt; context 是「這個任務需要時」才加，不是「以防萬一」先放著。一個聚焦的視窗，勝過一個塞滿的視窗。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;把 attention 想成一份預算，只花在跟「這個任務」相關的東西上。&lt;/p&gt;

&lt;h2&gt;
  
  
  五、告訴 agent 你的環境長怎樣
&lt;/h2&gt;

&lt;p&gt;你的程式碼不是只跑在一個地方。它跑在 local、跑在 CI／integration、也跑在 production——而這幾個環境的差異，往往就是會咬你一口的地方：不同的環境變數、不同的 feature flag、真資料庫對上 mock、某個環境有而另一個沒有的 secret。&lt;/p&gt;

&lt;p&gt;這些 agent「全部都不知道」，除非你寫下來。所以，把它寫下來：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Environments&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; local：用 Docker Postgres，MOCK_PAYMENTS=true，跑種子測試資料。
&lt;span class="p"&gt;-&lt;/span&gt; staging：用真的 Stripe 測試金鑰，schema 跟 prod 一致。
&lt;span class="p"&gt;-&lt;/span&gt; prod：用真的金鑰。永遠不要在這裡跑破壞性腳本。
        migration 一律要走人工核可才能上。
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;光是最後那一行，就可能救你一命——免得 agent 興高采烈地對著 production 跑了一個「清理」腳本，只因為從來沒人告訴它 production 是特別的。&lt;/p&gt;

&lt;h2&gt;
  
  
  六、修種子，不要修果子
&lt;/h2&gt;

&lt;p&gt;這是讓上面所有努力產生複利的那個習慣。&lt;/p&gt;

&lt;p&gt;agent 做錯時，你可以直接修「輸出」——改掉那段程式碼、繼續往下走。這修掉了「這一顆」果子。但壞掉的種子還埋在土裡，明天它會再長出一模一樣的錯。&lt;/p&gt;

&lt;p&gt;槓桿更高的做法，是去修「指令」。agent 用錯了測試框架？別只是把測試重寫一遍——把「我們用 Vitest，不是 Jest」加進 &lt;code&gt;agent.md&lt;/code&gt;。agent 一直去抓某個已棄用的 helper？把它加進「Don't」清單。每一次修正都變成永久的，同樣的錯就不會在往後每個 session 一再出現。&lt;/p&gt;

&lt;p&gt;當下慢一點，一個月下來快非常多。你不再是在修輸出，而是在改善那個「產生輸出的東西」。&lt;/p&gt;

&lt;h2&gt;
  
  
  一點誠實的但書
&lt;/h2&gt;

&lt;p&gt;這一切都還不是定下來的標準。你的 context 檔案還沒有一個 &lt;code&gt;npm test&lt;/code&gt; 能跑、沒有公認的 linter 來檢查指令、也沒有 CI gate 會在你的 &lt;code&gt;agent.md&lt;/code&gt; 跟現實脫節時亮紅燈。Context Development Lifecycle 是一個有用的視角，不是一套完成的工具鏈——工具都還在即時被發明出來，今天某些「最佳實務」，一年後回頭看大概會覺得很土。&lt;/p&gt;

&lt;p&gt;但你不需要等工具鏈成熟，就能把大部分的價值先拿到手。版控的指令檔、分層的規則、可查證的事實、被尊重的 context window，加上「修種子而不是修果子」的紀律——這些今天就能做。它就是「一個一直在跟你作對的 agent」和「一個感覺真的懂你專案的 agent」之間的差別。&lt;/p&gt;

&lt;p&gt;你的 agent，很可能比你的 context 願意讓它表現出來的，要好得多。&lt;/p&gt;

&lt;h2&gt;
  
  
  留一個問題給你
&lt;/h2&gt;

&lt;p&gt;在你「自己」的 agent 指令檔裡，目前最有價值的一行是哪一行——那個讓某個一再發生的錯，從此戛然而止的事實？留言告訴我，好的我想偷來用。&lt;/p&gt;




&lt;p&gt;&lt;em&gt;這篇延伸自 Patrick Debois 的 Context Development Lifecycle——他的原文 &lt;a href="https://tessl.io/blog/context-development-lifecycle-better-context-for-ai-coding-agents/" rel="noopener noreferrer"&gt;Optimizing Context for AI Coding Agents&lt;/a&gt; 是這個想法更完整的版本。&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Your AI Agent Isn't Dumb — Your Context Is</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Fri, 29 May 2026 07:04:42 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/your-ai-agent-isnt-dumb-your-context-is-1ob3</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/your-ai-agent-isnt-dumb-your-context-is-1ob3</guid>
      <description>&lt;p&gt;You ask your agent to add a feature. It writes clean, confident code — using a library version you ripped out six months ago, a folder layout you abandoned, and an auth pattern you've never used in this repo. The code compiles. It's also wrong in every way that matters.&lt;/p&gt;

&lt;p&gt;Your first instinct is to blame the model. "It hallucinated." Maybe. But more often the model did exactly what it should have given what it knew — and what it knew was a vague paragraph you typed into a chat box while distracted.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable thesis: &lt;strong&gt;most of the time your agent isn't dumb. Your context is.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;For years the hard part of building software was writing the code. That's quietly stopped being true. Agents now produce working code faster than most of us can review it. The bottleneck moved upstream — to &lt;em&gt;describing intent&lt;/em&gt;. Telling the agent what we want, what the constraints are, and what "good" looks like in &lt;em&gt;this&lt;/em&gt; codebase.&lt;/p&gt;

&lt;p&gt;And we are bad at this. We treat the instructions, rules, and project knowledge we feed agents as throwaway chat. We paste a prompt, get a result, and lose the prompt forever. We'd never treat our actual source code that way — we version it, review it, and test it. Patrick Debois (the guy who accidentally coined "DevOps") has been making this exact argument: context is the new code, and it deserves the same engineering rigor. He calls the emerging discipline the &lt;strong&gt;Context Development Lifecycle&lt;/strong&gt; — generate it, evaluate it, distribute it, observe it in production, just like software.&lt;/p&gt;

&lt;p&gt;I think the frame is genuinely useful. It's also early — more a direction than a paved road. So let me skip the theory and give you the parts you can actually do tomorrow.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Get knowledge out of your head and into files
&lt;/h2&gt;

&lt;p&gt;The single highest-leverage move: stop holding project knowledge in your head and your chat history, and put it in versioned files the agent reads automatically.&lt;/p&gt;

&lt;p&gt;Most agent tools support a project instruction file — &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;agent.md&lt;/code&gt;, &lt;code&gt;.cursorrules&lt;/code&gt;, whatever yours calls it. Treat it like a real artifact. Commit it. Review it in PRs. Let it accumulate the hard-won facts a new teammate would need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# agent.md&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Node 20, TypeScript strict mode. No &lt;span class="sb"&gt;`any`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Postgres via Drizzle. We do NOT use the ORM's migration tool —
  migrations live in &lt;span class="sb"&gt;`/migrations`&lt;/span&gt; and run via &lt;span class="sb"&gt;`npm run db:migrate`&lt;/span&gt;.

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; API handlers return &lt;span class="sb"&gt;`Result&amp;lt;T&amp;gt;`&lt;/span&gt;, never throw across boundaries.
&lt;span class="p"&gt;-&lt;/span&gt; Tests use Vitest. Co-locate as &lt;span class="sb"&gt;`*.test.ts`&lt;/span&gt; next to the source.

&lt;span class="gu"&gt;## Don't&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Don't add new dependencies without asking.
&lt;span class="p"&gt;-&lt;/span&gt; Don't touch &lt;span class="sb"&gt;`/legacy`&lt;/span&gt; — it's frozen and being deleted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice these aren't clever prompts. They're &lt;em&gt;facts&lt;/em&gt; — the same things you'd tell a human on day one. The win is that you write them once and every future session starts informed instead of guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Layer your rules — each doing one job
&lt;/h2&gt;

&lt;p&gt;Don't cram everything into one giant file. Split context by scope, the way you split config.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Global rules&lt;/strong&gt; (apply to everything you do): your personal preferences. "Explain trade-offs, don't just agree." "Prefer standard library over new deps." These follow &lt;em&gt;you&lt;/em&gt; across projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project rules&lt;/strong&gt; (this repo only): the stack, the conventions, the landmines. These follow &lt;em&gt;the code&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keeping them separate matters because they change at different rates and for different reasons. Your personal style is stable; a project's architecture shifts. When you mix them, you end up editing your universal preferences every time one repo does something weird — and that weirdness leaks into every other project. One file, one job.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Feed facts, not vibes
&lt;/h2&gt;

&lt;p&gt;Hallucination drops sharply when you give the agent something checkable instead of asking it to recall.&lt;/p&gt;

&lt;p&gt;"Use the latest React Router" invites the model to average over every version it ever saw in training. "We're on React Router 7, data routers only, here are the three patterns we use: [paste]" gives it ground truth. The more specific and &lt;em&gt;current&lt;/em&gt; the source, the less room there is to invent.&lt;/p&gt;

&lt;p&gt;Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pin versions explicitly. "React 19," not "React."&lt;/li&gt;
&lt;li&gt;Paste the actual API or doc snippet for anything fast-moving, instead of trusting recall.&lt;/li&gt;
&lt;li&gt;Point at real files: "follow the pattern in &lt;code&gt;src/handlers/users.ts&lt;/code&gt;" beats describing the pattern in prose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A checkable source beats a confident memory every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Treat context as a finite resource
&lt;/h2&gt;

&lt;p&gt;This one trips up almost everyone. The context window is not infinite, and — more importantly — &lt;em&gt;bigger isn't better&lt;/em&gt;. Stuffing in your whole codebase doesn't make the agent smarter; past a point it makes it worse. Relevant signal gets buried, the model loses the thread, and output quality quietly degrades.&lt;/p&gt;

&lt;p&gt;Watch for the tells: answers that drift from your conventions, repeated questions about things you already established, confident edits to the wrong file. That's usually not the model getting dumber — it's the context getting noisy.&lt;/p&gt;

&lt;p&gt;What to actually do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Notice degradation.&lt;/strong&gt; When a long session starts producing worse results, that's a signal, not a fluke.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compact and restart.&lt;/strong&gt; Summarize what matters — decisions made, current state — into a fresh, clean session. Most tools have a compaction step; use it deliberately instead of letting a session sprawl for hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't pre-stuff.&lt;/strong&gt; Add context when it's needed for the task at hand, not "just in case." A focused window beats a full one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of attention as a budget. Spend it on what's relevant to &lt;em&gt;this&lt;/em&gt; task.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Tell the agent about your environments
&lt;/h2&gt;

&lt;p&gt;Your code doesn't run in one place. It runs locally, in CI/integration, and in production — and those differ in ways that bite. Different env vars, different feature flags, a real database versus a mock, secrets that exist in one place and not another.&lt;/p&gt;

&lt;p&gt;The agent knows &lt;em&gt;none&lt;/em&gt; of this unless you write it down. So write it down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Environments&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; local:  uses Docker Postgres, MOCK_PAYMENTS=true, seeded test data.
&lt;span class="p"&gt;-&lt;/span&gt; staging: real Stripe test keys, mirrors prod schema.
&lt;span class="p"&gt;-&lt;/span&gt; prod:   real keys. NEVER run destructive scripts here.
           Migrations are gated behind manual approval.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line alone can save you from an agent cheerfully running a "cleanup" against production because nobody told it production was special.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Fix the seed, not the fruit
&lt;/h2&gt;

&lt;p&gt;This is the habit that makes everything above compound.&lt;/p&gt;

&lt;p&gt;When the agent gets something wrong, you can fix the output — edit the code, move on. That fixes &lt;em&gt;this&lt;/em&gt; fruit. The bad seed is still in the ground, and tomorrow it grows the same wrong thing again.&lt;/p&gt;

&lt;p&gt;The higher-leverage move is to fix the &lt;em&gt;instruction&lt;/em&gt;. Agent used the wrong test framework? Don't just rewrite the test — add "we use Vitest, not Jest" to &lt;code&gt;agent.md&lt;/code&gt;. Agent keeps reaching for a deprecated helper? Add it to the "don't" list. Each correction becomes permanent, and the same mistake stops recurring across every future session.&lt;/p&gt;

&lt;p&gt;It's slower in the moment and dramatically faster over a month. You're not fixing outputs anymore; you're improving the thing that &lt;em&gt;generates&lt;/em&gt; outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest caveat
&lt;/h2&gt;

&lt;p&gt;None of this is a settled standard. There's no &lt;code&gt;npm test&lt;/code&gt; for your context files yet, no agreed-on linter for instructions, no CI gate that fails when your &lt;code&gt;agent.md&lt;/code&gt; drifts from reality. The Context Development Lifecycle is a useful lens, not a finished toolchain — the tooling is being invented in real time, and some of today's best practice will look quaint in a year.&lt;/p&gt;

&lt;p&gt;But you don't need the mature toolchain to capture most of the value. Versioned instruction files, layered rules, checkable facts, a respected context window, and the discipline to fix the seed instead of the fruit — that's all available today, and it's the difference between an agent that fights you and one that feels like it actually knows your project.&lt;/p&gt;

&lt;p&gt;Your agent is probably better than your context is letting it be.&lt;/p&gt;

&lt;h2&gt;
  
  
  One question for you
&lt;/h2&gt;

&lt;p&gt;What's the single most valuable line currently living in &lt;em&gt;your&lt;/em&gt; agent instruction file — the one fact that stopped a recurring mistake cold? Drop it in the comments; I want to steal the good ones.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This builds on Patrick Debois's Context Development Lifecycle — his write-up &lt;a href="https://tessl.io/blog/context-development-lifecycle-better-context-for-ai-coding-agents/" rel="noopener noreferrer"&gt;Optimizing Context for AI Coding Agents&lt;/a&gt; is the fuller version of the idea.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Weekly — 2026-05-22 to 2026-05-29 | Anthropic's $965B Moment and the Infrastructure Bet</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 28 May 2026 23:04:41 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-05-22-to-2026-05-29-anthropics-965b-moment-and-the-infrastructure-bet-l7l</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-weekly-2026-05-22-to-2026-05-29-anthropics-965b-moment-and-the-infrastructure-bet-l7l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Anthropic closed the largest AI funding round in history — $65 billion at a $965 billion valuation — and dropped Claude Opus 4.8 the same day. Three questions follow: what the money actually buys, what the model actually changes, and whether either matters to the infrastructure layer where enterprise AI is actually decided.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Funding: What $65B Actually Buys
&lt;/h2&gt;

&lt;p&gt;Anthropic's Series H is not a vote of confidence in a product roadmap. It is a bet on infrastructure positioning&lt;a href="https://news.google.com/rss/articles/CBMiUEFVX3lxTE9FbzhQTWUzSGozVXNfbDYxWWUzTFBZZlpldmQyeDBnSExRdHVMSkJyVVV4OHdlNmx3LVBFcHVXTmJ0ck9TclZvVVlPRnpoZVhW?oc=5" rel="noopener noreferrer"&gt;Anthropic raises $65B in Series H funding at $965B post-money valuation - Anthropic&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiigFBVV95cUxPSi1XZmpReGJ1X3dWZ0JsQ1BHXzloYm93YjV2N09HX0VlRTFDa2huRE9jX3h6cWxQSDJDdVhnR1Awc01WZDRtVW13Zk50T3dINkJvSS13RkQ3WTM3d2NhRWpZWDV5VEhwV0FQMGRwNEJCdy1idFhIZXBqR3hUX0RHOHdEOU9lZU1GNXc?oc=5" rel="noopener noreferrer"&gt;Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up - The New York Times&lt;/a&gt;. At $965 billion post-money, the company is no longer competing solely in the model layer — it is building the substrate other companies build on.&lt;/p&gt;

&lt;p&gt;The timing is deliberate: Claude Opus 4.8 shipped the same day&lt;a href="https://news.google.com/rss/articles/CBMiWkFVX3lxTFBVeFZoaVpfX1hnTlJPS05nQWZYRnB6bUdOc3pzSmlyX3dpN3BleUlVQm53Z3Bxd29JOUw1MENDMWk5VF9CX1VSU0Q2eV9zMXZWNUVOb2V4N2VaUQ?oc=5" rel="noopener noreferrer"&gt;Introducing Claude Opus 4.8 - Anthropic&lt;/a&gt;, and within hours AWS confirmed hosting&lt;a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxORDctNzFXM1FDSHlUa3Ywdk9sVDB4V0tvMmVwamtxUzJfZGlyaXF2eThaT3B2ZWhuUzQ1LUlQZEpTWDZzTVlZdGZlOW5CdE1wNjRRZWRaYldWRHBpWEtUc2ZXM1dmRlpoT3djblB2Q2xaVTY5ekliNUh3N1RCM1l2MTd5Z29TaWdVanAxenpvcw?oc=5" rel="noopener noreferrer"&gt;Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)&lt;/a&gt;. What the money buys is multi-cloud distribution, enterprise procurement relationships, and geographic expansion into Korea and Italy&lt;a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxOX3ZSeUtaVl8zLUdiMHN4ZktEYXVkMFpkTU5icG1SRWR3NVViNVM4WFRSbk5yUzFYbmNTSldGLXNrRmM0dWMtZlh0V21OcGhRS0hvNGl2bEFneFotV1c4LTFLdGRqZi0yZ1oyRlRZUU1aVnlXRWpLM0JqMUhJSldONTc2SGhmTjJpZWpZ?oc=5" rel="noopener noreferrer"&gt;Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening - Anthropic&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMiYEFVX3lxTE1Mb1dkTXE0WXgzR1BsWERKOVRjUm5feFJOUzJBSnFHTVJZaGQyMDN6ejVVYWluZW9WRmg0QUFhMy10d0lwaHczS2I5WjJpQ1IzMTNnZnZyTTZYVXJENTRaLQ?oc=5" rel="noopener noreferrer"&gt;Anthropic opens Milan office to support Italian enterprise, research, and developers - Anthropic&lt;/a&gt; before OpenAI's EU footprint matures. AWS certification carries genuine weight in enterprise sales cycles — it means Claude is available through existing procurement frameworks that Fortune 500 IT departments already operate under. That is the infrastructure argument, and it does not require speculating about switching costs: the distribution channel is the switching cost.&lt;/p&gt;

&lt;p&gt;For engineering decision-makers: the relevant question is not "is Claude better than GPT?" this week. It is whether Anthropic's infrastructure push — not the model benchmark score — creates durable enterprise relationships that matter in 12–18 months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Opus 4.8: Capability vs. Distribution
&lt;/h2&gt;

&lt;p&gt;Opus 4.8 ships with claims of improved reasoning and agentic performance&lt;a href="https://news.google.com/rss/articles/CBMiWkFVX3lxTFBVeFZoaVpfX1hnTlJPS05nQWZYRnB6bUdOc3pzSmlyX3dpN3BleUlVQm53Z3Bxd29JOUw1MENDMWk5VF9CX1VSU0Q2eV9zMXZWNUVOb2V4N2VaUQ?oc=5" rel="noopener noreferrer"&gt;Introducing Claude Opus 4.8 - Anthropic&lt;/a&gt;. AWS hosting&lt;a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxORDctNzFXM1FDSHlUa3Ywdk9sVDB4V0tvMmVwamtxUzJfZGlyaXF2eThaT3B2ZWhuUzQ1LUlQZEpTWDZzTVlZdGZlOW5CdE1wNjRRZWRaYldWRHBpWEtUc2ZXM1dmRlpoT3djblB2Q2xaVTY5ekliNUh3N1RCM1l2MTd5Z29TaWdVanAxenpvcw?oc=5" rel="noopener noreferrer"&gt;Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)&lt;/a&gt; means enterprise access through existing procurement relationships, which is a meaningfully different go-to-market than OpenAI's direct API.&lt;/p&gt;

&lt;p&gt;No independent benchmark data is available at press time. The capability claims should be treated as vendor statements until third-party evaluation is published. The distribution advantage — AWS customers can provision via existing contracts and compliance frameworks — is concrete today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Rewrites the Search Box
&lt;/h2&gt;

&lt;p&gt;For the first time in 25 years, Google has changed the search interface itself&lt;a href="https://news.google.com/rss/articles/CBMigAFBVV95cUxPVzZ0WC1UelpJRkNTMldfbkcwaW43MDJ3cXh2elVTLVBhUHZBeTFWYWNXT3NHWjlPcFJ5SjZkOTdOZ21BdEJpeFE5NlE4ZUhpQllLcWEtM0RtTlJhdE9yOUdEajRELTR3ZVFpbXBrbVg3cUVVSGFyc1hTRERfZHhUbg?oc=5" rel="noopener noreferrer"&gt;Powered by A.I., Google Changes Its Search Box for the First Time in 25 Years - The New York Times&lt;/a&gt;. Not a ranking tweak. A fundamental UI change driven by AI integration. This received less coverage than the Anthropic funding round.&lt;/p&gt;

&lt;p&gt;The practical implication: Google is no longer protecting the ranked-list paradigm internally. The search box is becoming an answer engine, which has downstream effects on SEO-driven businesses, content monetization, and how AI summarization interacts with publisher attribution. If you ship products that depend on Google index crawl patterns, this is a structural signal, not a cosmetic one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AGI Timeline Returns
&lt;/h2&gt;

&lt;p&gt;Demis Hassabis said AGI is 3 to 4 years away&lt;a href="https://news.google.com/rss/articles/CBMihAFBVV95cUxNWXFNTndMc3ZjNTBEYnJoTTR0TW1kQ1VsSDdHRnRpc1pLMl9WbVZMLXFZLW80aWpFOGFyTFlGTGNkaDQzbHg5MDBPTGM1Q1lueDJXUkVRdTlBUVdvVExwVlFray0yZUtCd3VVeXdpSHZhdjZ1VkE2d0prckctY2hCZVQ5aWM?oc=5" rel="noopener noreferrer"&gt;Google DeepMind’s Hassabis: AGI is 3 to 4 years away - Sherwood News&lt;/a&gt;. This is the same person who said it was "5 to 10 years away" in 2023. The update is presented as increased confidence, not new evidence.&lt;/p&gt;

&lt;p&gt;For technical readers: Hassabis is not publishing a methodology. "AGI" remains undefined across the statements made this week — Anthropic, OpenAI, and Google all use it differently. Treat the 3–4 year claim as a narrative instrument, not an engineering forecast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Job Displacement Fault Line
&lt;/h2&gt;

&lt;p&gt;An ex-Meta scientist publicly called Anthropic's CEO "wrong" on claims about AI-driven job losses&lt;a href="https://news.google.com/rss/articles/CBMib0FVX3lxTE1kNmxnTW9Fc3lrMnV3RHN4WmtQV2VwRGVhdEdGd3RKX25hYXZVWVlNR0JKamFYVVFoMjhXQ1ZXME1hX0JYODUxb0dsSUpFdHBaY3BKZUJ6U0R3dllYRVRtN1R0LUlTWXgxNlo4U25NVQ?oc=5" rel="noopener noreferrer"&gt;OpenAI and Anthropic dig in against each other on AI jobs apocalypse - Axios&lt;/a&gt;. This is not an academic debate. It is a dispute about what the economic data actually shows, and it is happening at the CEO level, which means it is affecting policy positioning and public affairs strategy.&lt;/p&gt;

&lt;p&gt;The uncertainty here cuts both ways. If job displacement is slower than feared, the talent market implications for AI tooling are different than if it accelerates. The Anthropic/OpenAI public disagreement is a proxy for a genuine forecasting failure — nobody has reliable data on this timeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Number Worth Tracking
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Valuation&lt;/th&gt;
&lt;th&gt;Runway Implied&lt;/th&gt;
&lt;th&gt;Notable This Week&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;$965B&lt;/td&gt;
&lt;td&gt;~3–4 years at current burn&lt;/td&gt;
&lt;td&gt;Series H close + Opus 4.8 launch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;IPO filing pending&lt;/td&gt;
&lt;td&gt;Public market dependent&lt;/td&gt;
&lt;td&gt;Sam Altman governance friction cited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The governance issue OpenAI faces — Reuters flagged the "Sam Altman problem"&lt;a href="https://news.google.com/rss/articles/CBMimwFBVV95cUxNcDVqZUpkbHdkWXFXNzZRcmE0LTlFeVpmTHZZaXM1TWM3dTdoUEM2MVZINVp0UnptV0djWTNZbGwtOV9qZWpUUTljOGQ2VDRBVE5nVkc5d2V4ZlRLdHZQeFM1Y3ZZdVVSR1dtMWtudTNDeDVSQlNaYjF6VjhjNml3al9HZVBLbVppQzJsWklBT0h0d1FSTUVTUU0tTQ?oc=5" rel="noopener noreferrer"&gt;Breakingviews - OpenAI’s IPO has a Sam Altman problem - Reuters&lt;/a&gt; — is structurally different from Anthropic's position. A private company with a $965B valuation can defer difficult questions about accountability. A public company cannot. This matters for enterprise customers evaluating vendor stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool, Not Shrine
&lt;/h2&gt;

&lt;p&gt;AlphaProof Nexus solved 9 Erdős problems and proved 44 sequence conjectures for a few hundred dollars in compute&lt;a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxQNVBNWVpma2VKZUZSb1lEa0N4SEJiTmZ0OV9Jbm13cm9idW1Ea3lSS0ZyZDBEY3lDSHVfdzVoRWZzRm1hRTlkQWY5X1daU3AyVzVXMFhzeGxOUkFDbUI2QlByZVF0R2RRQkJndEhMdFRHVmlaWlRkTi1HY1p5RldVX3NUM3VhSnZfWjM4alpzZHlrTE9CWm5uTWxSVG9xR29jVlRlUWJwR1o3emdhZjlydkV0UDdDUnBS?oc=5" rel="noopener noreferrer"&gt;Google Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars - the-decoder.com&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMid0FVX3lxTE9EVExRczZFcGZMUkRQdHZubEpBWEFJZjZnQkdKMjZWdHM5RmJTLW1qQVhpV1U5UzlPV1JTUTRBUVlsNEFESnhYcEVwY2hrYXBvdW13N2FrNU5LUnRKMGNoLUVHd04wV003ekwtUGtFaHM2M1NvRnRZ?oc=5" rel="noopener noreferrer"&gt;Google DeepMind's AlphaProof Nexus solves 9 Erdős problems and proves 44 sequence conjectures - Crypto Briefing&lt;/a&gt;. That is a concrete data point when assessing AI math capability in production workflows. The capability is real. The question — as always — is whether it maps to your actual use case.&lt;/p&gt;

&lt;p&gt;OpenAI was named a Leader in enterprise coding agents by Gartner&lt;a href="https://news.google.com/rss/articles/CBMibEFVX3lxTFBKVUUwdmFIQzlrbVJtcmFrWFZ3d0Qtc1Naa21qOWFGM2RmbnJIX2ExT2ZjTERqaXV3eEwxZ0pGVmJwc2dPUUZZcmtCLTNlbjhsLVVqX3VmVXFfb0RlMzJaa21LZTI4YUE5eEYzSw?oc=5" rel="noopener noreferrer"&gt;OpenAI named a Leader in enterprise coding agents by Gartner - OpenAI&lt;/a&gt;. This is a marketing data point, not a technical evaluation. It tells you about OpenAI's enterprise sales motion, not relative code quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This week:&lt;/strong&gt; Anthropic has the capital, the model, the distribution, and the international footprint. OpenAI has the IPO and the governance problem. Google has the distribution and is redesigning its core product around AI. None of these are the same bet. Pick which layer you are playing in.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
    <item>
      <title>AI 週報 — 2026-05-22 to 2026-05-29 | 定價權轉移：Anthropic 估值超越 OpenAI 背後的結構訊號</title>
      <dc:creator>Yang Goufang</dc:creator>
      <pubDate>Thu, 28 May 2026 23:02:37 +0000</pubDate>
      <link>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-05-22-to-2026-05-29-ding-jia-quan-zhuan-yi-anthropic-gu-zhi-chao-yue-openai-bei-hou-de-jie-gou-xun-hao-3i31</link>
      <guid>https://dev.to/yang_goufang_23c7ba674984/ai-zhou-bao-2026-05-22-to-2026-05-29-ding-jia-quan-zhuan-yi-anthropic-gu-zhi-chao-yue-openai-bei-hou-de-jie-gou-xun-hao-3i31</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;本週最重要的訊號不是任何單一模型發布，而是 Anthropic 的估值數字開始超越 OpenAI——當定價權從挑選模型的開發者轉移到定義工作流的平台，商業敘事就進入了下一章。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  模型與平台：Claude Opus 4.8 登陸 AWS，定價權歸屬出現訊號
&lt;/h2&gt;

&lt;p&gt;本週最大宗的產品新聞是 Claude Opus 4.8 正式在 AWS 上提供&lt;a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxORDctNzFXM1FDSHlUa3Ywdk9sVDB4V0tvMmVwamtxUzJfZGlyaXF2eThaT3B2ZWhuUzQ1LUlQZEpTWDZzTVlZdGZlOW5CdE1wNjRRZWRaYldWRHBpWEtUc2ZXM1dmRlpoT3djblB2Q2xaVTY5ekliNUh3N1RCM1l2MTd5Z29TaWdVanAxenpvcw?oc=5" rel="noopener noreferrer"&gt;Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)&lt;/a&gt;。這不是簡單的「又多了一個雲端選項」——AWS 是企業採購事實上的守門人，進入這個管道等於拿到了進入大型企業合規採購流程的正式門票。&lt;/p&gt;

&lt;p&gt;結合 Anthropic 成為全球估值最高 AI 新創、估值突破千億美元&lt;a href="https://news.google.com/rss/articles/CBMiigFBVV95cUxPSi1XZmpReGJ1X3dWZ0JsQ1BHXzloYm93YjV2N09HX0VlRTFDa2huRE9jX3h6cWxQSDJDdVhnR1Awc01WZDRtVW13Zk50T3dINkJvSS13RkQ3WTM3d2NhRWpZWDV5VEhwV0FQMGRwNEJCdy1idFhIZXBqR3hUX0RHOHdEOU9lZU1GNXc?oc=5" rel="noopener noreferrer"&gt;Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up - The New York Times&lt;/a&gt;的背景，兩件事必須一起看：&lt;strong&gt;帳面估值是落後指標，AWS 管道承認是領先指標&lt;/strong&gt;。&lt;/p&gt;

&lt;p&gt;Claude Opus 4.8 本身的能力宣稱需要留意「發布 vs 可用 vs 可商用」的三層區分。本次是 Anthropic 直接發布&lt;a href="https://news.google.com/rss/articles/CBMiWkFVX3lxTFBVeFZoaVpfX1hnTlJPS05nQWZYRnB6bUdOc3pzSmlyX3dpN3BleUlVQm53Z3Bxd29JOUw1MENDMWk5VF9CX1VSU0Q2eV9zMXZWNUVOb2V4N2VaUQ?oc=5" rel="noopener noreferrer"&gt;Introducing Claude Opus 4.8 - Anthropic&lt;/a&gt;，而非客戶限定 preview；AWS 頁面同步更新&lt;a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxORDctNzFXM1FDSHlUa3Ywdk9sVDB4V0tvMmVwamtxUzJfZGlyaXF2eThaT3B2ZWhuUzQ1LUlQZEpTWDZzTVlZdGZlOW5CdE1wNjRRZWRaYldWRHBpWEtUc2ZXM1dmRlpoT3djblB2Q2xaVTY5ekliNUh3N1RCM1l2MTd5Z29TaWdVanAxenpvcw?oc=5" rel="noopener noreferrer"&gt;Claude Opus 4.8 is now available on AWS - Amazon Web Services (AWS)&lt;/a&gt;表示可用性已達企業交付標準。與前代 Opus 4 的比較基準尚未有公開的第三方評測數據，工程團隊在選型時不應直接用新舊型號的發布文案做為依據。&lt;/p&gt;

&lt;p&gt;此外，Anthropic 同步發表了「coding agents 在社會科學領域」的應用論文&lt;a href="https://news.google.com/rss/articles/CBMickFVX3lxTFBZOVA5Z3pKZ1JzSHJXaFl0LWdSQlZfZWhGb3AtY1R3MExZRm9KS2dfRmNzTE1XR2VXdjlPMVh6U25SU2JmcXBZQjlnSWRreHM2SGZmTnpQVzRLQVFmVFQyOExiR2o5dFhSb3A2WVBmV2RzQQ?oc=5" rel="noopener noreferrer"&gt;Coding agents in the social sciences - Anthropic&lt;/a&gt;。這屬於研究階段的案例分享，&lt;strong&gt;不是產品發布&lt;/strong&gt;。論文中呈現的 workflow 整合程度、工作流程覆蓋範圍，與實際企業落地所需的穩定性和 tooling 支持，兩者之間還有工程鴻溝。&lt;/p&gt;

&lt;h2&gt;
  
  
  制度性擴張：米蘭與首爾，歐亞企業市場的網絡效應正在成型
&lt;/h2&gt;

&lt;p&gt;Anthropic 本週宣布兩項幾乎同步的機構佈局：米蘭辦公室服務義大利企業、研究機構與開發者&lt;a href="https://news.google.com/rss/articles/CBMiYEFVX3lxTE1Mb1dkTXE0WXgzR1BsWERKOVRjUm5feFJOUzJBSnFHTVJZaGQyMDN6ejVVYWluZW9WRmg0QUFhMy10d0lwaHczS2I5WjJpQ1IzMTNnZnZyTTZYVXJENTRaLQ?oc=5" rel="noopener noreferrer"&gt;Anthropic opens Milan office to support Italian enterprise, research, and developers - Anthropic&lt;/a&gt;；KiYoung Choi 被任命為韓國區代表，首爾辦公室即將開幕&lt;a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxOX3ZSeUtaVl8zLUdiMHN4ZktEYXVkMFpkTU5icG1SRWR3NVViNVM4WFRSbk5yUzFYbmNTSldGLXNrRmM0dWMtZlh0V21OcGhRS0hvNGl2bEFneFotV1c4LTFLdGRqZi0yZ1oyRlRZUU1aVnlXRWpLM0JqMUhJSldONTc2SGhmTjJpZWpZ?oc=5" rel="noopener noreferrer"&gt;Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening - Anthropic&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;這兩個佈局動作傳遞的訊息比任何模型能力更新的公告都更持久：&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;歐洲（米蘭）&lt;/strong&gt;——義大利是歐洲第三大經濟體，也是 GDPR 框架下企業 AI 採購的複雜合規節點。當地法人的存在將合規對話從「境外服務商」轉為「本土責任實體」，這是企業採購進入合規流程的第一道門檻。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;亞洲（首爾）&lt;/strong&gt;——南韓在半導體供應鏈、手機與消費電子製造、以及汽車業的 AI 整合需求上，存在大量高價值的 B2B 應用場景。辦公室設立是 market entry 的必要條件，不是充分條件；真正的落地進度取決於後續支援與 API 可用性承諾。&lt;/p&gt;

&lt;p&gt;橫向對比：Anthropic 這套「進入主要經濟體設立本土存在」的策略，和 OpenAI 兩年前走向公開市場集資的策略，代表兩種不同的市場滲透模型。前者以機構信任為核心，後者以資金槓桿為核心。&lt;strong&gt;估值數字的差距現在正在檢驗哪個模型更適合制度性市場。&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Altman 的 IPO 問題與 OpenAI 的治理結構風險
&lt;/h2&gt;

&lt;p&gt;本週有兩篇深入報導&lt;a href="https://news.google.com/rss/articles/CBMimwFBVV95cUxNcDVqZUpkbHdkWXFXNzZRcmE0LTlFeVpmTHZZaXM1TWM3dTdoUEM2MVZINVp0UnptV0djWTNZbGwtOV9qZWpUUTljOGQ2VDRBVE5nVkc5d2V4ZlRLdHZQeFM1Y3ZZdVVSR1dtMWtudTNDeDVSQlNaYjF6VjhjNml3al9HZVBLbVppQzJsWklBT0h0d1FSTUVTUU0tTQ?oc=5" rel="noopener noreferrer"&gt;Breakingviews - OpenAI’s IPO has a Sam Altman problem - Reuters&lt;/a&gt;&lt;a href="https://news.google.com/rss/articles/CBMioAFBVV95cUxNYlJTM2tDdWVaWEdoNm5xUTNoS3NRS3ZEUDk0QVVIcV9QY3hjMmlUUFV3dUtFVUx2bDVMS3NHLU5sNG1PTlk0aWtYS2pOS2UzeWpFQlBndzJUM3VuVVdVdGtRQTFOLU5wUW1iU2s3cTJ4M2F5a21DdjRIWlpmLTdVTEJHd3JIV1VucHh5NXljWXVwbU5MaXN3Ry1KeXZXWG5G?oc=5" rel="noopener noreferrer"&gt;The big questions OpenAI’s trillion-dollar IPO filing may finally answer - Fortune&lt;/a&gt;聚焦 OpenAI IPO 申請與 Sam Altman 股權結構的問題。核心張力在於：Altman 不持有 OpenAI 股權——這在公開市場是異常結構，投資人評估治理風險時這是不可忽視變數。&lt;/p&gt;

&lt;p&gt;如果 IPO 完成後 Altman 對公司重大決策的影響力缺乏股權基礎的制度性約束，外部董事與投資人的制衡機制將比一般科技公司更為脆弱。監管機構（SEC）在審批時必然會問這個問題&lt;a href="https://news.google.com/rss/articles/CBMioAFBVV95cUxNYlJTM2tDdWVaWEdoNm5xUTNoS3NRS3ZEUDk0QVVIcV9QY3hjMmlUUFV3dUtFVUx2bDVMS3NHLU5sNG1PTlk0aWtYS2pOS2UzeWpFQlBndzJUM3VuVVdVdGtRQTFOLU5wUW1iU2s3cTJ4M2F5a21DdjRIWlpmLTdVTEJHd3JIV1VucHh5NXljWXVwbU5MaXN3Ry1KeXZXWG5G?oc=5" rel="noopener noreferrer"&gt;The big questions OpenAI’s trillion-dollar IPO filing may finally answer - Fortune&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;從工程決策者的角度，這件事的落地意涵是：當你評估基於 OpenAI API 建構的系統時，你同時在假設這家公司的治理結構在 IPO 後不會發生影響 API 可用性的根本性變化。這個假設不是零風險的。&lt;/p&gt;

&lt;h2&gt;
  
  
  Google 搜尋框 25 年首度改版：核心業務的還擊節奏
&lt;/h2&gt;

&lt;p&gt;《紐約時報》報導 Google 搜尋框在 25 年間首度重大改版，引入生成式 AI 能力&lt;a href="https://news.google.com/rss/articles/CBMigAFBVV95cUxPVzZ0WC1UelpJRkNTMldfbkcwaW43MDJ3cXh2elVTLVBhUHZBeTFWYWNXT3NHWjlPcFJ5SjZkOTdOZ21BdEJpeFE5NlE4ZUhpQllLcWEtM0RtTlJhdE9yOUdEajRELTR3ZVFpbXBrbVg3cUVVSGFyc1hTRERfZHhUbg?oc=5" rel="noopener noreferrer"&gt;Powered by A.I., Google Changes Its Search Box for the First Time in 25 Years - The New York Times&lt;/a&gt;。對比上一個週期（2023 年的 BARD 緊急發布），這次是正式產品整合而非失敗回應。&lt;/p&gt;

&lt;p&gt;這則新聞的戰略意涵不在於「Google 終於做 AI 搜尋」——那已經是兩年前的判斷；而在於&lt;strong&gt;時程&lt;/strong&gt;：從慌亂緊急應答到正式產品整合，Google 用兩年穩住了核心業務的 AI 升級節奏。這代表搜尋這種高流量、廣觸及的產品的 AI 整合，已經進入可工程化、可維運的階段，不再只是口號或實驗。&lt;/p&gt;

&lt;p&gt;對企業決策者言：如果你的產品策略涉及資訊獲取、文件摘要或知識管理，Google 這次改版代表「AI-first 搜尋」已成為標配功能，未來三年的差異化將不在於「有沒有 AI 搜尋」，而在於「誰能做出更高價值的垂直整合」。&lt;/p&gt;

&lt;h2&gt;
  
  
  信仰與模型：Anthropic 的非技術公關戰線
&lt;/h2&gt;

&lt;p&gt;本週有一個不尋常的新聞維度：Anthropic 共同創辦人 Chris Olah 公開論述教皇良十四世通喻「Magnifica humanitas」&lt;a href="https://news.google.com/rss/articles/CBMibkFVX3lxTFA1UGI4bzVyUmNNQURqVFpBZ0tkNjNyMy1GLWxwTXpBSGJzZEpKcTUycEFkVURaeDlkR3lodUpOY1pmN01YdE1wNkFSNTdTRG5keWZSb2tPTjd5SWpFVlFRdkZKTUFBcHAtZGpYdmVR?oc=5" rel="noopener noreferrer"&gt;Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas" - Anthropic&lt;/a&gt;，隨即《科學人》報導 Anthropic 請宗教思想家參與塑造 Claude 的方向&lt;a href="https://news.google.com/rss/articles/CBMivgFBVV95cUxQalBxVWVVOUtFS0tobHRYekxHMFRRcmdYbEUzMTRJVGxmV0ItMnMxS3hYcGRKQXJvU3RhQlZ0SWxzc2hJdVBHN2M5OFVtQmdhZWZuZ3JnUjdWVlVncnRLZXdQWWd0QkNkWmdYS3N0V1hXcFBLcmtnWDdZY2FmSWxMZ0tpYzBpaEhDdDJMVWlDZkg4VHo0bEhucGV5dy04T2d2X0lrSjRsakJjeHNyMjdYOWgxN1d6RkF5OUJDM3VR?oc=5" rel="noopener noreferrer"&gt;Anthropic asks religious thinkers to help shape Claude as pope warns about AI - Scientific American&lt;/a&gt;。&lt;/p&gt;

&lt;p&gt;這不是技術新聞，但其戰略意圖清晰：當 AI 模型的社會影響力進入制度性監管階段，論述話語權的爭奪就和模型能力同等重要。這個「讓宗教思想家參與 AI 倫理」的框架，與 OpenAI 強調安全與對齊的路線有重疊，但切入角度不同。&lt;/p&gt;

&lt;p&gt;從實務觀察：這類非技術論述會影響監管機構的立法方向。立法者在技術細節上依賴業界自我約束時，提供框架的廠商將獲得不成比例的制度影響力。&lt;strong&gt;關注監管動態的決策者必須把這種「倫理外交」視為企業風險評估的一環。&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  本週橫向對比
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;事件&lt;/th&gt;
&lt;th&gt;主要意涵&lt;/th&gt;
&lt;th&gt;落地階段&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8 登陸 AWS&lt;/td&gt;
&lt;td&gt;企業管道合規門票到手&lt;/td&gt;
&lt;td&gt;可商用&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic 估值超越 OpenAI&lt;/td&gt;
&lt;td&gt;機構信任導向的商業模型獲市場確認&lt;/td&gt;
&lt;td&gt;商業階段&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;米蘭與首爾辦公室設立&lt;/td&gt;
&lt;td&gt;歐亞制度性市場進入策略啟動&lt;/td&gt;
&lt;td&gt;進入階段&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google 搜尋框改版&lt;/td&gt;
&lt;td&gt;搜尋巨頭完成核心業務 AI 整合&lt;/td&gt;
&lt;td&gt;可用&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI IPO 結構性風險&lt;/td&gt;
&lt;td&gt;治理問題影響 API 長期可用性假設&lt;/td&gt;
&lt;td&gt;制度風險&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Papal 倫理框架參與&lt;/td&gt;
&lt;td&gt;監管話語權競爭進入新維度&lt;/td&gt;
&lt;td&gt;論述階段&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  結語
&lt;/h2&gt;

&lt;p&gt;本週的底層訊號不是新模型能力，而是「誰在控制工作流」這個問題的答案正在形成：Anthropic 的機構滲透策略與 OpenAI 的公開市場路徑，正在測試制度性採納與純市場邏輯兩種不同的滲透模型。實際結果還需要至少兩個季度的營收數據才能確認。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;對的工程決策&lt;/strong&gt;：本週最值得追蹤的不是 Opus 4.8 的 benchmark，而是同一財報季內兩家公司營收增速的差距——這才是進入企業預算審批流程的實際起點。估值是落後指標；營收增速差距才是領先指標。&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tech</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
