<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: member_0af6418a</title>
    <description>The latest articles on DEV Community by member_0af6418a (@member_0af6418a).</description>
    <link>https://dev.to/member_0af6418a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3922814%2F8702938f-4a0f-4d34-8727-f7d838c522d5.png</url>
      <title>DEV Community: member_0af6418a</title>
      <link>https://dev.to/member_0af6418a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/member_0af6418a"/>
    <language>en</language>
    <item>
      <title>Don't Use Doubao Only as a Chatbot: 6 Practical AI Workflows for Everyday Users</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Tue, 16 Jun 2026 15:59:15 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/dont-use-doubao-only-as-a-chatbot-6-practical-ai-workflows-for-everyday-users-44nl</link>
      <guid>https://dev.to/member_0af6418a/dont-use-doubao-only-as-a-chatbot-6-practical-ai-workflows-for-everyday-users-44nl</guid>
      <description>&lt;p&gt;Most people start using AI by chatting with it.&lt;/p&gt;

&lt;p&gt;That is a reasonable first step. You ask it to rewrite a sentence, draft a short caption, or explain a concept. The result is immediate, so the tool feels useful.&lt;/p&gt;

&lt;p&gt;But if AI stays inside the chat box, it quickly becomes a more fluent search box. Every session starts from zero. Every useful answer disappears into the conversation history. The workflow never compounds.&lt;/p&gt;

&lt;p&gt;This article is not a feature review of Doubao. The more practical question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How can everyday users connect Doubao to real tasks while still keeping judgment, evidence, and risk boundaries in the right place?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here are six workflows that are useful in daily work and life.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. When You Do Not Know How To Ask, Make It Ask You First
&lt;/h2&gt;

&lt;p&gt;Many weak AI answers start with a weak prompt.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Help me organize these materials.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may work, but Doubao does not know who the materials are for, what the output should look like, what constraints matter, or what decision you need to make afterward.&lt;/p&gt;

&lt;p&gt;A better prompt is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I have a pile of materials, but I do not know how to organize them.
Ask me 5 questions first, so you can clarify what I need.
Then turn the result into a report outline I can reuse.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not to force AI to answer immediately. The point is to let it help you clarify the task before it generates anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Give It A Real Scenario, Not Just A Generic Question
&lt;/h2&gt;

&lt;p&gt;Everyday users often treat AI like search.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What should I pay attention to in a rental contract?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That question can produce a generic checklist, but it does not know what kind of contract you have, what you are worried about, or whether your key concern is deposit, early termination, penalty fees, subletting, or automatic renewal.&lt;/p&gt;

&lt;p&gt;If you have a real situation, say so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I am reviewing a rental contract.
My main concerns are deposit, early termination, penalty fees, and automatic renewal.
First give me a checklist.
If I paste contract text later, answer only based on what I provide.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things matter here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Describe the actual situation.&lt;/li&gt;
&lt;li&gt;Set the answer boundary early.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not let AI invent clauses it has not seen, and do not let a risk scan become a legal conclusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. For Images And Screenshots, Separate What It Sees From What It Infers
&lt;/h2&gt;

&lt;p&gt;Tools like Doubao can read manuals, bills, menus, forms, screenshots, and notifications.&lt;/p&gt;

&lt;p&gt;That is useful, but there is a risk: AI can sometimes present an inference as if it were a fact.&lt;/p&gt;

&lt;p&gt;So add this sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First tell me what you can clearly see in the image.
Then tell me what you are inferring.
If something is unclear, say it is unclear. Do not guess.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces the answer into two layers: observed information and inferred interpretation.&lt;/p&gt;

&lt;p&gt;That distinction helps you avoid being guided by an answer that sounds fluent but may not be fully grounded in the image.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Use Contract Review Only As Risk Triage
&lt;/h2&gt;

&lt;p&gt;When people hear "AI can read contracts," they often ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can I sign this contract?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the wrong role for AI.&lt;/p&gt;

&lt;p&gt;Contracts involve legal risk. AI should not make the final decision for you. A safer workflow is to use it for triage: identify suspicious areas, group them, and prepare questions for a professional or the other party.&lt;/p&gt;

&lt;p&gt;You can ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based only on the contract content I provide,
help me identify clauses that may need attention.

Group them by:
1. Fees and penalties
2. Refund, termination, or cancellation terms
3. Automatic renewal or extension
4. Unilateral change clauses
5. Liability limits or disclaimers
6. Questions I should further confirm

If the clause is incomplete or unclear, say you are unsure.
Do not make the final decision for me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps you notice what you might miss. It does not replace a lawyer, a professional advisor, or your own responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Do Not Ask Only One AI For Important Questions
&lt;/h2&gt;

&lt;p&gt;The most dangerous AI mistake is not always an obvious failure. Sometimes the answer is wrong, but sounds very confident.&lt;/p&gt;

&lt;p&gt;For important questions, do not ask only one model.&lt;/p&gt;

&lt;p&gt;You can ask Doubao first, then send the answer to another AI system and ask it to challenge the response.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This is the answer Doubao gave me.
Please review it from the opposite side:
Where is it not rigorous?
Where is evidence missing?
Where is the wording too absolute?
Is there a more cautious version?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is not to make several AI systems argue. The goal is to expose blind spots.&lt;/p&gt;

&lt;p&gt;A second model can often notice missing evidence, overconfident wording, or assumptions that the first answer glossed over.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Turn Every Useful Session Into A Reusable Experience Card
&lt;/h2&gt;

&lt;p&gt;Most people finish an AI conversation and move on.&lt;/p&gt;

&lt;p&gt;That wastes a lot of value.&lt;/p&gt;

&lt;p&gt;If a conversation solved a real problem, ask Doubao to turn it into a reusable card:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn this problem-solving process into an experience card I can reuse next time.

Include:
1. The problem scenario
2. The key information I provided
3. How you broke down the problem
4. How I should ask next time
5. A reusable prompt template
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this step, every AI session starts over. With it, each useful conversation becomes a template for the next one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Boundaries Matter
&lt;/h2&gt;

&lt;p&gt;The better AI gets, the less you should treat it as authority.&lt;/p&gt;

&lt;p&gt;I recommend a few fixed rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For factual questions, ask for source, date, and origin.&lt;/li&gt;
&lt;li&gt;For image recognition, separate visible information from inference.&lt;/li&gt;
&lt;li&gt;For files and contracts, require answers based only on provided content.&lt;/li&gt;
&lt;li&gt;For medical, legal, financial, signing, job-changing, or high-impact decisions, do not let AI decide for you.&lt;/li&gt;
&lt;li&gt;If it cannot provide a source, treat the answer as a lead, not a conclusion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI can help you ask better questions, identify risks, prepare checklists, and compare answers. Final judgment still belongs to evidence, qualified professionals, and your own responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Everyday users do not need to start with complex agent systems.&lt;/p&gt;

&lt;p&gt;If Doubao is already on your phone, start with one real problem: turn messy material into a checklist, explain a screenshot, triage a contract, ask another AI to challenge an answer, and then turn the process into a reusable prompt.&lt;/p&gt;

&lt;p&gt;That is more valuable than casual chatting.&lt;/p&gt;

&lt;p&gt;The point is not how many times you ask AI something. The point is whether AI has entered the problems you actually need to solve.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>career</category>
    </item>
    <item>
      <title>Claude Fable 5 Field Test: Verify AI News Before You React</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Sat, 13 Jun 2026 04:20:07 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/claude-fable-5-field-test-verify-ai-news-before-you-react-2d34</link>
      <guid>https://dev.to/member_0af6418a/claude-fable-5-field-test-verify-ai-news-before-you-react-2d34</guid>
      <description>&lt;p&gt;Claude Fable 5 is easy to turn into a familiar headline: the strongest AI model has arrived, and ordinary people are about to lose more work to AI.&lt;/p&gt;

&lt;p&gt;That is not the angle I want to take here.&lt;/p&gt;

&lt;p&gt;After the announcement, I read Anthropic's official launch post and model documentation, then ran a small set of hands-on tests in Claude. My conclusion is not that Fable 5 is unimportant. It is also not that it is already reliable enough to trust blindly.&lt;/p&gt;

&lt;p&gt;The more useful takeaway is this: Fable 5 is worth watching, but the practical skill ordinary users need is the ability to verify AI news before reacting to it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dub1v2tvqai8sayi66x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dub1v2tvqai8sayi66x.png" alt="Claude Fable 5 field test cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article walks through four questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are Claude Fable 5 and Claude Mythos 5?&lt;/li&gt;
&lt;li&gt;What actually changed in this release?&lt;/li&gt;
&lt;li&gt;How should we read benchmark claims without overreading them?&lt;/li&gt;
&lt;li&gt;How can everyday users verify similar AI news in a simple way?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  First: do not call this a full "Claude 5" release
&lt;/h2&gt;

&lt;p&gt;The first thing to get right is the naming.&lt;/p&gt;

&lt;p&gt;This is not simply "Claude 5 is fully released."&lt;/p&gt;

&lt;p&gt;A more precise description is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Fable 5 is the widely available model for ordinary users and developers.&lt;/li&gt;
&lt;li&gt;Claude Mythos 5 is an invitation-only preview connected to Project Glasswing and trusted partners. It is not broadly available to every user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic's model documentation lists the corresponding API IDs: &lt;code&gt;claude-fable-5&lt;/code&gt; and &lt;code&gt;claude-mythos-5&lt;/code&gt;. It also lists a 1M-token context window and up to 128k output tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nby2pskzikyww8gk6ew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nby2pskzikyww8gk6ew.png" alt="Claude model documentation screenshot" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That matters because this is not just about smoother chat. The model can take in more material and produce longer, more complete code, reports, and analysis.&lt;/p&gt;

&lt;p&gt;But long context and long output are not the same as guaranteed correctness. They give the model more room to work. The result still needs human review.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real shift: AI is moving from chat toward project execution
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzm7p8j7rtvsjtjp2ri5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzm7p8j7rtvsjtjp2ri5o.png" alt="AI is moving from chat toward project execution" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most important shift I see in Fable 5 is not that it can write a nicer paragraph. It is that it feels closer to a model that can carry a long task forward.&lt;/p&gt;

&lt;p&gt;The recurring themes in the official material and external write-ups are long-horizon work, engineering tasks, complex documents, table analysis, and iterative correction.&lt;/p&gt;

&lt;p&gt;Anthropic's launch material includes an engineering migration example involving a large Ruby codebase. Ethan Mollick's field report also describes a model that can take a vague goal, do research, write code, test, and revise. His important caveat is that the output is still imperfect and needs expert review.&lt;/p&gt;

&lt;p&gt;That is why I do not read this release as "another chatbot upgrade."&lt;/p&gt;

&lt;p&gt;The more useful framing is:&lt;/p&gt;

&lt;p&gt;AI is moving from "help me write one thing" toward "help me move a project forward."&lt;/p&gt;

&lt;p&gt;For ordinary users, this does not mean instant replacement. It means your role changes. Instead of asking the tool one sentence at a time, you increasingly need to define the goal, constraints, and acceptance criteria, then inspect whether the work is actually correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks matter, but one table is not the whole story
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdvcxno5mfn7pp8wv1s5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdvcxno5mfn7pp8wv1s5.png" alt="Do not read one benchmark table as the whole story" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some of the numbers in Anthropic's benchmark table are strong.&lt;/p&gt;

&lt;p&gt;For example, the official table reports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-Bench Pro: Fable 5 at 80.3%, GPT 5.5 at 58.6%.&lt;/li&gt;
&lt;li&gt;FrontierCode Diamond: Fable 5 at 29.3%, GPT 5.5 at 5.7%.&lt;/li&gt;
&lt;li&gt;Terminal-Bench 2: Fable 5 at 88.0%, GPT 5.5 + Codex CLI at 83.4%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5j87jeillczme67veita.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5j87jeillczme67veita.png" alt="Anthropic benchmark table screenshot" width="660" height="726"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These numbers are meaningful signals, especially for engineering and long-horizon tasks. But they should not be converted into a universal claim that Fable 5 beats every other model in every situation.&lt;/p&gt;

&lt;p&gt;Benchmark scope, tools, versioning, and environment all matter.&lt;/p&gt;

&lt;p&gt;For example, the independent &lt;code&gt;terminal-bench@2.1&lt;/code&gt; leaderboard lists Codex CLI + GPT-5.5 at 83.4% +/- 2.2, Claude Code + Claude Opus 4.8 at 78.9% +/- 2.5, and Gemini CLI + Gemini 3.1 Pro at 70.7% +/- 2.9. That independent leaderboard does not currently list Fable 5 directly, so it should not be merged with Anthropic's official table as if they were the same measurement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna7mv70k1jfpmbi5rsh4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna7mv70k1jfpmbi5rsh4.png" alt="Fable 5 and Codex CLI comparison card" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My read is simple: Fable 5 looks very strong, especially for long tasks, coding, and complex information work. But whenever an AI news item is built around a benchmark screenshot, I want to ask three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is this from the vendor, a third party, or a user test?&lt;/li&gt;
&lt;li&gt;Are the compared systems running under the same conditions?&lt;/li&gt;
&lt;li&gt;Does this benchmark match the task I actually need to do?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  My field test: strong, but early use can still be uneven
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokla5sioczabjh28ahve.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokla5sioczabjh28ahve.png" alt="Fable 5 field test notes" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I did not want to stop at the benchmark table, so I ran a few small tests.&lt;/p&gt;

&lt;p&gt;First, I checked basic availability. I had Fable 5 selected, sent another task, and got &lt;code&gt;Model isn't available&lt;/code&gt;. That is a practical issue ordinary users may hit when a new model has just launched.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98yzoon4ovx0q2oqecdj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98yzoon4ovx0q2oqecdj.png" alt="Fable 5 availability issue screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Second, I continued with Chinese-language tasks. At one point the model returned Japanese content instead of Chinese. I then added a stricter instruction: use simplified Chinese only, and keep each sentence short. After that, I asked it for a one-sentence summary, a video opening, and title options. Those three follow-up tasks returned to Chinese.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm8293pwc66pryo1gc82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbm8293pwc66pryo1gc82.png" alt="Fable 5 Japanese output issue screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These two observations do not mean Fable 5 is weak. A more proportional conclusion is that early use can be uneven. A single success or failure should not become the whole verdict.&lt;/p&gt;

&lt;p&gt;Third, I uploaded the official benchmark table and asked the model to turn it into a 30-second Chinese voiceover for ordinary viewers. I also asked it to mark which conclusions should not be overread. This worked reasonably well: it extracted the main points and warned that different leaderboards should not be compared too casually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foy7v0wt93d76atpyo0nd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foy7v0wt93d76atpyo0nd.png" alt="Fable 5 benchmark analysis screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fourth, I gave it a video topic, screenshots, and risk constraints. This was closer to a real workflow test. It produced a structure, listed facts to verify, and separated out claims that could be overstated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpx2xo3cixa5xt9c2ykse.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpx2xo3cixa5xt9c2ykse.png" alt="Fable 5 project planning screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89qvaj3yyunbvb8diihu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89qvaj3yyunbvb8diihu.png" alt="Fable 5 risk checklist screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11yuhk02j9qlrvhfmosr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11yuhk02j9qlrvhfmosr.png" alt="Fable 5 deliverables screenshot" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where Fable 5 started to feel less like a chat model and more like a working assistant. It could split a messy task into structure, facts, risks, and next steps.&lt;/p&gt;

&lt;p&gt;But that is still not the same as automatic correctness. The structure needs review. The facts need checking. The final output still has to fit the real scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more issue: model restrictions should be visible to users
&lt;/h2&gt;

&lt;p&gt;There was also an important policy controversy around this release.&lt;/p&gt;

&lt;p&gt;Simon Willison wrote about a restriction mechanism related to some frontier model-development requests that was not always visible to users. Engadget later reported that Anthropic adjusted the policy after pushback from the research community, moving toward making those safeguards visible.&lt;/p&gt;

&lt;p&gt;For ordinary users, the lesson is not just about this specific policy. It is that stronger models come with more product-level routing, fallback behavior, and safety restrictions. What you see in the answer may reflect not only model capability, but also product design and policy decisions.&lt;/p&gt;

&lt;p&gt;So instead of only asking "is this model strong?", it is worth asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In which scenarios is it strong?&lt;/li&gt;
&lt;li&gt;Which tasks trigger restrictions or fallbacks?&lt;/li&gt;
&lt;li&gt;Can the user see when that happens?&lt;/li&gt;
&lt;li&gt;What human checks are still required before trusting the result?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A simple three-step method for reading AI news
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpi14rikkkh9v8q80vwv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpi14rikkkh9v8q80vwv.png" alt="Three-step AI news verification method" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are an everyday user trying to keep up with AI, I would avoid reacting immediately to words like "strongest," "revolutionary," or "everyone will be replaced."&lt;/p&gt;

&lt;p&gt;Use a simple three-step check instead.&lt;/p&gt;

&lt;p&gt;First, check the official source.&lt;/p&gt;

&lt;p&gt;Read the launch post, model documentation, pricing page, or API docs. Official material is not the full truth, but it anchors the basics: model name, access scope, parameters, limitations, and intended use cases.&lt;/p&gt;

&lt;p&gt;Second, look for real tests.&lt;/p&gt;

&lt;p&gt;A useful test is not just a riddle or a screenshot of a perfect answer. Put the model into a real task: read a table, modify code, draft a plan, analyze a file, or handle a small workflow. Pay attention to failures as much as successes.&lt;/p&gt;

&lt;p&gt;Third, test your own scenario.&lt;/p&gt;

&lt;p&gt;Do not ask whether the model is "the strongest." Ask whether it helps with one task you actually have: summarize meeting notes, review a contract for risky clauses, design a study plan, analyze a spreadsheet, prototype code, or plan content.&lt;/p&gt;

&lt;p&gt;If it reliably improves your own workflow, that is practical value. If it only looks impressive in a news post, you do not need to panic.&lt;/p&gt;

&lt;h2&gt;
  
  
  My takeaway
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 is worth paying attention to.&lt;/p&gt;

&lt;p&gt;The direction is clear: AI is moving from chat toward project execution. Longer context, longer output, stronger engineering performance, and better complex-document handling all push the user role from direct operator toward goal-setter and reviewer.&lt;/p&gt;

&lt;p&gt;But that does not mean ordinary users should let anxiety drive their decisions.&lt;/p&gt;

&lt;p&gt;The useful habit is to treat AI news as a learning entry point, not an emotional trigger. Check the source, inspect real tests, and try the model in your own scenario. The earlier you build that verification habit, the less likely you are to be dragged around by every new model launch.&lt;/p&gt;

&lt;p&gt;That is the real reason I ran this field test: not to declare one model as the permanent winner, but to build a more rational way to analyze AI news.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic launch announcement: &lt;a href="https://www.anthropic.com/news/claude-fable-5-mythos-5" rel="noopener noreferrer"&gt;https://www.anthropic.com/news/claude-fable-5-mythos-5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic model documentation: &lt;a href="https://docs.anthropic.com/en/docs/about-claude/models/overview" rel="noopener noreferrer"&gt;https://docs.anthropic.com/en/docs/about-claude/models/overview&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ethan Mollick field report: &lt;a href="https://www.oneusefulthing.org/p/what-it-feels-like-to-work-with-mythos" rel="noopener noreferrer"&gt;https://www.oneusefulthing.org/p/what-it-feels-like-to-work-with-mythos&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Simon Willison's post: &lt;a href="https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/" rel="noopener noreferrer"&gt;https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Engadget report: &lt;a href="https://www.engadget.com/2192004/anthropic-walks-back-policy-sabotaging-research/" rel="noopener noreferrer"&gt;https://www.engadget.com/2192004/anthropic-walks-back-policy-sabotaging-research/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Feishu CLI Hands-on: Letting Codex Enter a Real Office Workflow</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Tue, 02 Jun 2026 06:55:46 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/feishu-cli-hands-on-letting-codex-enter-a-real-office-workflow-1l0d</link>
      <guid>https://dev.to/member_0af6418a/feishu-cli-hands-on-letting-codex-enter-a-real-office-workflow-1l0d</guid>
      <description>&lt;p&gt;Feishu now has an official CLI, and I wanted to test a practical question:&lt;/p&gt;

&lt;p&gt;Can an AI agent use it to enter a real office workflow?&lt;/p&gt;

&lt;p&gt;I did not start by manually reading the documentation and turning it into a scripted demo. Instead, I gave the official &lt;code&gt;larksuite/cli&lt;/code&gt; repository to Codex and asked it to figure out what the tool could do, install it, go through the configuration path, wait for human authorization, and then send me a message through Feishu.&lt;/p&gt;

&lt;p&gt;After that, I turned the test into a small recurring workflow: a daily reminder to check SEO and GEO status for our main blog.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq41rhswvwm4dfdjqk0q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq41rhswvwm4dfdjqk0q.png" alt="Feishu CLI hands-on" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is more than a command-line tool
&lt;/h2&gt;

&lt;p&gt;Most CLI tools are useful because they turn repeated clicks into commands.&lt;/p&gt;

&lt;p&gt;Feishu CLI is more interesting because the official README explicitly treats humans and AI agents as users of the tool. In the README I checked on June 1, 2026, the project describes support for messaging, docs, Bitable, spreadsheets, slides, calendar, mail, tasks, meetings, Markdown, and more.&lt;/p&gt;

&lt;p&gt;It also describes 200+ commands and 26 AI Agent Skills.&lt;/p&gt;

&lt;p&gt;That matters because agent-based office automation often gets stuck at the same point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the agent can understand the task;&lt;/li&gt;
&lt;li&gt;it can generate a plan or script;&lt;/li&gt;
&lt;li&gt;but it has no stable, authorized way to operate the office system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When office actions become available through a CLI, the agent can move from "understanding" to "executing a bounded action."&lt;/p&gt;

&lt;p&gt;This does not mean full autonomous office work. It means repeated, low-risk, verifiable actions can start becoming workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r2igw78azzx5nbxxlig.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5r2igw78azzx5nbxxlig.png" alt="Feishu CLI capability surface" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Letting Codex run the setup path
&lt;/h2&gt;

&lt;p&gt;The first step was to let Codex read the official repository.&lt;/p&gt;

&lt;p&gt;It found the &lt;code&gt;larksuite/cli&lt;/code&gt; GitHub repo, read the Chinese README, and noticed that the documentation includes a quick-start path specifically for AI agents.&lt;/p&gt;

&lt;p&gt;The flow looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Codex confirmed the install command from the official README.&lt;/li&gt;
&lt;li&gt;It checked the local Node / npm environment.&lt;/li&gt;
&lt;li&gt;It ran the CLI install command.&lt;/li&gt;
&lt;li&gt;It entered the configuration flow.&lt;/li&gt;
&lt;li&gt;It returned authorization links to me.&lt;/li&gt;
&lt;li&gt;I completed authorization in the browser.&lt;/li&gt;
&lt;li&gt;Codex resumed and checked the authorization state.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important part is the responsibility split.&lt;/p&gt;

&lt;p&gt;The agent can read documentation, run commands, parse output, and prepare the next step. But authorization should stay human-controlled. Once an agent can operate an office system, permissions become a product and security question, not just a convenience feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5i809fgq3kw5st00bj9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5i809fgq3kw5st00bj9c.png" alt="Agent setup and authorization flow" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The first useful loop: a Feishu message comes back
&lt;/h2&gt;

&lt;p&gt;After installation and authorization, I tested the smallest complete loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;human intent
-&amp;gt; agent understands the task
-&amp;gt; agent calls the CLI
-&amp;gt; CLI operates Feishu
-&amp;gt; Feishu message returns to the human
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is small, but it is enough to prove the basic workflow path.&lt;/p&gt;

&lt;p&gt;At that point, the question changes from "can this send a message?" to "what repeated office action should be turned into a workflow?"&lt;/p&gt;

&lt;p&gt;I chose a daily SEO / GEO reminder.&lt;/p&gt;

&lt;p&gt;The reminder is not complex. It asks me to check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google / Bing index status&lt;/li&gt;
&lt;li&gt;search query and click changes&lt;/li&gt;
&lt;li&gt;whether AI search or large models mention the brand&lt;/li&gt;
&lt;li&gt;Chinese and English article titles, summaries, and links&lt;/li&gt;
&lt;li&gt;whether recent content distribution created new entry points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of work that is important but easy to forget. A stable private reminder is more useful than a flashy automation that is too risky to run every day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with small private tasks
&lt;/h2&gt;

&lt;p&gt;The official README also includes a safety warning around AI-agent automation: hallucination, uncontrolled execution, and prompt injection are real risks when an agent operates an office platform under a user's authorization.&lt;/p&gt;

&lt;p&gt;That should shape the first workflows.&lt;/p&gt;

&lt;p&gt;My preferred starting point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;private reminders, not group-wide bots;&lt;/li&gt;
&lt;li&gt;personal todos, not cross-team approvals;&lt;/li&gt;
&lt;li&gt;fixed checklists, not open-ended execution;&lt;/li&gt;
&lt;li&gt;read-only checks before write or delete actions;&lt;/li&gt;
&lt;li&gt;no secrets, tokens, chat IDs, or open IDs in public screenshots, articles, or logs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfm1tukepqbpomfygag9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfm1tukepqbpomfygag9.png" alt="Feishu CLI safety boundary" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent office automation should not begin with broad permissions. It should begin with low-risk, high-repeat, verifiable actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for practical users
&lt;/h2&gt;

&lt;p&gt;Many people still use AI mostly for Q&amp;amp;A, writing, or generating spreadsheet formulas.&lt;/p&gt;

&lt;p&gt;Those are useful, but the bigger shift happens when agents can enter real workflows.&lt;/p&gt;

&lt;p&gt;Feishu CLI is a good example. Once an office platform has a standardized command interface, an agent can help with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;daily metric reminders;&lt;/li&gt;
&lt;li&gt;meeting follow-up summaries;&lt;/li&gt;
&lt;li&gt;document summaries;&lt;/li&gt;
&lt;li&gt;calendar conflict checks;&lt;/li&gt;
&lt;li&gt;repeated spreadsheet updates;&lt;/li&gt;
&lt;li&gt;fixed operational checklists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these tasks are dramatic. But they are repeated, easy to forget, and valuable when they happen consistently.&lt;/p&gt;

&lt;p&gt;The value of Feishu CLI is not that a command line can replace the Feishu client. Its value is that it gives agents an office-system entry point that can be installed, authorized, checked, executed, and interrupted by a human when needed.&lt;/p&gt;

&lt;p&gt;Full write-up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kunpeng-ai.com/en/blog/feishu-cli-ai-agent-workflow/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/feishu-cli-ai-agent-workflow/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>Opus 4.8, Qwen, DeepSeek, and a Claude Code Failure: What I Could Actually Reproduce</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Sun, 31 May 2026 06:34:03 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/opus-48-qwen-deepseek-and-a-claude-code-failure-what-i-could-actually-reproduce-2chl</link>
      <guid>https://dev.to/member_0af6418a/opus-48-qwen-deepseek-and-a-claude-code-failure-what-i-could-actually-reproduce-2chl</guid>
      <description>&lt;p&gt;There is a claim going around that Claude's latest Opus 4.8 may have been distilled from Qwen or DeepSeek.&lt;/p&gt;

&lt;p&gt;That kind of claim spreads quickly, especially when it can be turned into a screenshot or a short clip. I wanted to test the small version of the claim first: if I ask Opus 4.8 what model it is, does it identify itself as Qwen or DeepSeek?&lt;/p&gt;

&lt;p&gt;In my May 30 test, I could not reproduce that behavior.&lt;/p&gt;

&lt;p&gt;But the more useful part of the test happened before the model test even started. Claude Code broke after an upgrade with &lt;code&gt;spawn EBUSY&lt;/code&gt;, and Codex helped diagnose and fix the local Claude Code state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4k25vse1y4kchcxu9hg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4k25vse1y4kchcxu9hg.png" alt="Opus 4.8 Qwen DeepSeek test cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The first failure was not the model
&lt;/h2&gt;

&lt;p&gt;I originally planned to open Claude Code, switch to the new Opus 4.8 path, and ask a direct identity question.&lt;/p&gt;

&lt;p&gt;Instead, Claude Code failed after the upgrade with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spawn EBUSY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of problem that is easy to misread. When an AI coding tool fails to start, it is tempting to blame the account, the network, the subscription, or the remote model service.&lt;/p&gt;

&lt;p&gt;Codex pointed in a more local direction: the Claude Code component state looked broken.&lt;/p&gt;

&lt;p&gt;The useful clues were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an old session file parsing problem&lt;/li&gt;
&lt;li&gt;a Claude Code executable that appeared to be half-downloaded, locked, or otherwise incomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After cleaning up the local component state, Claude Code ran again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5min5w6bqybxrkcitl3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5min5w6bqybxrkcitl3.png" alt="Claude Code upgrade failed with spawn EBUSY" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a very normal kind of AI tooling failure. The demo version of AI coding looks smooth. The real version often includes local caches, CLI updates, broken sessions, locked binaries, and confusing error messages.&lt;/p&gt;

&lt;p&gt;If the toolchain is broken, the model has not really been tested yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then I tested the identity claim
&lt;/h2&gt;

&lt;p&gt;After Claude Code was working again, I asked Opus 4.8 a direct question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What large model are you?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this run, it identified itself as Claude Opus 4.8, developed by Anthropic, and running in the Claude Code environment.&lt;/p&gt;

&lt;p&gt;It did not identify itself as Qwen.&lt;/p&gt;

&lt;p&gt;It did not identify itself as DeepSeek.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox2g2rbx9xm65z53t0gk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox2g2rbx9xm65z53t0gk.png" alt="Opus 4.8 identity answer" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The careful conclusion is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In this test material, I did not reproduce Opus 4.8 identifying itself as Qwen or DeepSeek.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is intentionally narrow.&lt;/p&gt;

&lt;p&gt;It does not prove anything broad about training lineage, distillation, data contamination, or evaluation artifacts. A single self-identity answer is not a rigorous method for determining model origin.&lt;/p&gt;

&lt;p&gt;But it does mean I would not treat the stronger viral claim as settled without more reproducible evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical lesson: keep more than one agent
&lt;/h2&gt;

&lt;p&gt;The most useful part of this test was not the model identity answer. It was the workflow lesson.&lt;/p&gt;

&lt;p&gt;Claude Code broke. Codex helped fix Claude Code.&lt;/p&gt;

&lt;p&gt;That suggests a practical setup for anyone using AI coding tools seriously: keep more than one agent installed.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If Claude Code fails, ask Codex to inspect logs and local state.&lt;/li&gt;
&lt;li&gt;If Codex hits a confusing error, ask Claude to analyze the message.&lt;/li&gt;
&lt;li&gt;If one toolchain is stuck, use the other agent to preserve diagnostic momentum.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not about declaring one tool better than another. It is about avoiding a single point of failure in your AI workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1mps6k2onqpgcz97kha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1mps6k2onqpgcz97kha.png" alt="Keep more than one coding agent installed" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Do not outsource verification to the timeline
&lt;/h2&gt;

&lt;p&gt;The second lesson is about model rumors.&lt;/p&gt;

&lt;p&gt;Claims like "this model is distilled from that model" or "this model is just a wrapper" are easy to share. They may be worth investigating, but they should not be accepted from a screenshot alone.&lt;/p&gt;

&lt;p&gt;A better habit is to record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;date and version&lt;/li&gt;
&lt;li&gt;local environment&lt;/li&gt;
&lt;li&gt;exact prompt&lt;/li&gt;
&lt;li&gt;model route or tool context&lt;/li&gt;
&lt;li&gt;screenshots or logs&lt;/li&gt;
&lt;li&gt;the actual output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the discussion can move from reaction to evidence.&lt;/p&gt;

&lt;p&gt;That is the direction I want more model tests to take: less team-picking, more reproducible traces.&lt;/p&gt;

&lt;p&gt;Full write-up:&lt;br&gt;&lt;br&gt;
&lt;a href="https://kunpeng-ai.com/en/blog/opus-48-qwen-deepseek-claude-code-codex-test/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/opus-48-qwen-deepseek-claude-code-codex-test/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>testing</category>
    </item>
    <item>
      <title>CodeWhale accepted our PRs: better coding agents need better harnesses</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Fri, 29 May 2026 10:51:41 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/codewhale-accepted-our-prs-better-coding-agents-need-better-harnesses-351n</link>
      <guid>https://dev.to/member_0af6418a/codewhale-accepted-our-prs-better-coding-agents-need-better-harnesses-351n</guid>
      <description>&lt;p&gt;DeepSeek-TUI has gone through an important update. It now has a new name, CodeWhale, and two harness-related PRs from our work have been accepted by the maintainers.&lt;/p&gt;

&lt;p&gt;This is not a flashy product change. It is not a new screen or a new button. A user may open the tool and not notice it immediately.&lt;/p&gt;

&lt;p&gt;But if you have used coding agents on real projects, this kind of change matters. The hard part is not only whether the model can generate code. The agent also needs to know what it changed, why a test failed, and where it should look next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dpjzbu4gfcqjns8vxwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dpjzbu4gfcqjns8vxwq.png" alt="CodeWhale accepted our harness PRs" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in CodeWhale
&lt;/h2&gt;

&lt;p&gt;The two accepted PRs improve the harness around the agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PR #1971 exposes &lt;code&gt;apply_patch&lt;/code&gt; preflight metadata, so before the agent edits files, it can see which paths the patch is expected to affect.&lt;/li&gt;
&lt;li&gt;PR #1973 summarizes Cargo failures in tool metadata, so a long failure log can become a shorter signal the agent can reason about.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the model is the brain, the harness is the workbench between that brain and the engineering scene. A weak workbench leaves the model guessing. A clearer workbench gives it better signals.&lt;/p&gt;

&lt;p&gt;When people discuss AI coding tools, they often start with model capability: is the model stronger, is the context longer, can it write more code automatically?&lt;/p&gt;

&lt;p&gt;Those questions matter. But in day-to-day engineering, another question matters just as much: does the tool turn the task scene into something the model can understand, trace, and review?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kwz9arvgglf7sggstsp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kwz9arvgglf7sggstsp.png" alt="Two merged CodeWhale PRs" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  These PRs are not about writing more code
&lt;/h2&gt;

&lt;p&gt;The first change is simple: before applying a patch, tell the agent which paths the patch will touch.&lt;/p&gt;

&lt;p&gt;That sounds small, but it affects the next decision. If a patch changes a config file, a test file, and a core logic file, where should the agent inspect first after a failure? If path information is missing, the agent can easily spend time in the wrong place.&lt;/p&gt;

&lt;p&gt;The second change is about Cargo failure logs.&lt;/p&gt;

&lt;p&gt;Build and test logs can be long. The useful part may be buried inside dozens or hundreds of lines. A human engineer filters out noise almost automatically: error type, likely location, useful hint, next check. An agent that receives one raw blob of log text can be pulled away by noise.&lt;/p&gt;

&lt;p&gt;The value of this change is not that the harness makes decisions for the agent. It organizes the scene so the agent can make a better next move.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7182mjtkmuvqbmna5mr0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7182mjtkmuvqbmna5mr0.png" alt="A clearer harness reduces guessing" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for AI replacing work
&lt;/h2&gt;

&lt;p&gt;This also connects to a bigger question: what kind of work is AI actually starting to replace?&lt;/p&gt;

&lt;p&gt;In programming, I do not think the first thing being replaced is complete engineering judgment. Not yet.&lt;/p&gt;

&lt;p&gt;What is easier to automate first is the repeated, fragmented work around engineering judgment: collecting changed-file context, reading long logs, summarizing failure causes, and listing the next possible checks.&lt;/p&gt;

&lt;p&gt;Those tasks are not meaningless. They take attention. But they are not the same as deciding the product goal, choosing the tradeoff, or accepting the risk.&lt;/p&gt;

&lt;p&gt;The important point is that AI does not become useful in a vacuum. It needs an environment that provides clean signals.&lt;/p&gt;

&lt;p&gt;If a tool throws a long log at the model and hopes the model reconstructs all the context, that is mostly a bet on guessing ability. If the tool can say what changed, where the failure is concentrated, and what evidence should guide the next step, the agent becomes more stable.&lt;/p&gt;

&lt;p&gt;So the shift is not "programmers are immediately replaced." A more practical view is that parts of context cleanup, log triage, and first-pass failure analysis are becoming easier to automate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What developers can take from this
&lt;/h2&gt;

&lt;p&gt;For anyone using coding agents, the takeaway is direct: do not only ask whether the model is strong. Ask whether you have given it a proper harness.&lt;/p&gt;

&lt;p&gt;A useful harness should answer questions like these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Before the agent modifies files, can it know which files may be affected?&lt;/li&gt;
&lt;li&gt;After a test fails, can the failure become a clean signal instead of raw noise?&lt;/li&gt;
&lt;li&gt;Can the next fix continue from evidence instead of starting over?&lt;/li&gt;
&lt;li&gt;Can the system mark where human judgment is still required?&lt;/li&gt;
&lt;li&gt;After the task ends, is there a record that can be reviewed?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These questions are less exciting than "switch to a stronger model." They are also closer to real productivity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z6z3d5e1r75dlvbl8f0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z6z3d5e1r75dlvbl8f0.png" alt="Do not only swap models. Improve the harness." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The larger lesson
&lt;/h2&gt;

&lt;p&gt;Progress in AI coding tools does not always arrive as a dramatic new feature. Sometimes it is a clearer patch-impact signal, a cleaner failure summary, or a task scene that can be reviewed later.&lt;/p&gt;

&lt;p&gt;Those lower-level changes are what help an agent move from answering to doing.&lt;/p&gt;

&lt;p&gt;So when we talk about what AI will replace, it helps to make the question more specific. It is not replacing complete engineering judgment all at once. It is first replacing some repeated context organization, log filtering, and first-pass debugging work.&lt;/p&gt;

&lt;p&gt;The part that remains human is still important: goals, tradeoffs, risk control, and deciding how the tool should fit into the workflow.&lt;/p&gt;

&lt;p&gt;Canonical version:&lt;br&gt;
&lt;a href="https://kunpeng-ai.com/en/blog/codewhale-harness-pr-merged/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/codewhale-harness-pr-merged/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;PRs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Hmbown/CodeWhale/pull/1971" rel="noopener noreferrer"&gt;https://github.com/Hmbown/CodeWhale/pull/1971&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Hmbown/CodeWhale/pull/1973" rel="noopener noreferrer"&gt;https://github.com/Hmbown/CodeWhale/pull/1973&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>programming</category>
      <category>github</category>
    </item>
    <item>
      <title>Codex + Tencent LKEAP 401: It Was Not the Key</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Mon, 18 May 2026 14:00:03 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/codex-tencent-lkeap-401-it-was-not-the-key-3i35</link>
      <guid>https://dev.to/member_0af6418a/codex-tencent-lkeap-401-it-was-not-the-key-3i35</guid>
      <description>&lt;p&gt;I ran into a failure that looked like an authentication problem at first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;unexpected status 401 Unauthorized
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The setup was Codex against Tencent Cloud LKEAP Token Plan, using an OpenAI-compatible Chat Completions endpoint.&lt;/p&gt;

&lt;p&gt;The important clue was not the &lt;code&gt;401&lt;/code&gt;. It was the final URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions/responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That URL shape is already wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4dldh5d4ghd52p043ir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4dldh5d4ghd52p043ir.png" alt="Codex LKEAP protocol path debugging cover" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The config that caused it
&lt;/h2&gt;

&lt;p&gt;The provider entry looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[model_providers.custom]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"custom"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;span class="py"&gt;requires_openai_auth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mistake is subtle.&lt;/p&gt;

&lt;p&gt;The Tencent endpoint in this setup is Chat Completions-style:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/plan/v3/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But newer Codex expects a Responses API-style provider when &lt;code&gt;wire_api = "responses"&lt;/code&gt; is used. So Codex appended &lt;code&gt;/responses&lt;/code&gt; to a base URL that already ended in &lt;code&gt;/chat/completions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/plan/v3/chat/completions/responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4dtec9f9srgd3gk1bx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4dtec9f9srgd3gk1bx6.png" alt="The broken URL revealed the root cause" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At that point, rotating the API key is unlikely to help. The request is already going to the wrong protocol path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not switch Codex back to Chat Completions?
&lt;/h2&gt;

&lt;p&gt;That was the next thing I checked.&lt;/p&gt;

&lt;p&gt;The newer Codex configuration rejects &lt;code&gt;wire_api = "chat"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;invalid configuration: `wire_api = "chat"` is no longer supported.
How to fix: set `wire_api = "responses"` in your provider config.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So this is not just a missing slash or a bad base URL. It is a protocol mismatch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Codex wants to speak Responses API.&lt;/li&gt;
&lt;li&gt;The LKEAP endpoint I was using speaks Chat Completions.&lt;/li&gt;
&lt;li&gt;"OpenAI-compatible" does not automatically mean "compatible with every OpenAI client mode."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is the real lesson.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1p93qkjtdrdc9dlfvsa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1p93qkjtdrdc9dlfvsa.png" alt="Responses API and Chat Completions are different protocol shapes" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My debugging order now
&lt;/h2&gt;

&lt;p&gt;For this class of issue, I would check things in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Inspect the final request URL.&lt;/li&gt;
&lt;li&gt;Confirm the client's wire protocol.&lt;/li&gt;
&lt;li&gt;Confirm the upstream endpoint shape.&lt;/li&gt;
&lt;li&gt;Only then spend time on keys and permissions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the URL contains patterns like these, stop and look at path composition first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/chat/completions/responses
/v1/v1
/responses/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The status code may be a symptom. The URL shape is often the evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workaround: local protocol adapter
&lt;/h2&gt;

&lt;p&gt;The workaround I tested toward was a small local Node.js proxy.&lt;/p&gt;

&lt;p&gt;Codex calls a local Responses-shaped endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:15722/v1/responses
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The local proxy converts that into a Chat Completions request and forwards it to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Codex config then points to the local proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"tencent_lkeap_proxy"&lt;/span&gt;
&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"glm-5.1"&lt;/span&gt;
&lt;span class="py"&gt;disable_response_storage&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.tencent_lkeap_proxy]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Tencent LKEAP via local Responses proxy"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://127.0.0.1:15722/v1"&lt;/span&gt;
&lt;span class="py"&gt;requires_openai_auth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OPENAI_API_KEY"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The real Tencent key stays in the proxy process environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;TENCENT_LKEAP_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"REDACTED"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;TENCENT_LKEAP_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tools\codex-responses-to-chat-proxy.mjs&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex still expects an OpenAI auth variable for this provider shape, so I used a local dummy value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"local-proxy-dummy"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy ignores that dummy value and uses &lt;code&gt;TENCENT_LKEAP_API_KEY&lt;/code&gt; upstream.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6js438mg8hwx4x29868k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6js438mg8hwx4x29868k.png" alt="Use a local proxy to translate Responses requests to Chat Completions" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Boundary
&lt;/h2&gt;

&lt;p&gt;I would not describe this as a complete drop-in replacement.&lt;/p&gt;

&lt;p&gt;What was verified in the original debugging record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the failure path was identified&lt;/li&gt;
&lt;li&gt;the local proxy script was created&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;node --check&lt;/code&gt; passed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Still to verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;full end-to-end smoke test with a real LKEAP key&lt;/li&gt;
&lt;li&gt;streaming behavior under longer tasks&lt;/li&gt;
&lt;li&gt;Codex tool-call traffic&lt;/li&gt;
&lt;li&gt;cancellation, timeout, and error mapping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction matters. A protocol adapter can be useful, but it should be treated as an adapter to harden, not as magic compatibility.&lt;/p&gt;

&lt;p&gt;The practical takeaway:&lt;/p&gt;

&lt;p&gt;When an AI coding tool fails against an "OpenAI-compatible" endpoint, do not ask only whether the key is valid. Ask whether the client and server are actually speaking the same API shape.&lt;/p&gt;

&lt;p&gt;Original version:&lt;br&gt;
&lt;a href="https://kunpeng-ai.com/en/blog/codex-lkeap-protocol-path-debugging/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/codex-lkeap-protocol-path-debugging/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>Can Claude Code Still Use DeepSeek? A Windows Test with cc-switch</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Fri, 15 May 2026 13:35:06 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/can-claude-code-still-use-deepseek-a-windows-test-with-cc-switch-2ahl</link>
      <guid>https://dev.to/member_0af6418a/can-claude-code-still-use-deepseek-a-windows-test-with-cc-switch-2ahl</guid>
      <description>&lt;p&gt;A lot of older third-party Claude model routes have become unreliable. I tested a narrower path on Windows: Claude Code through &lt;code&gt;cc-switch&lt;/code&gt; to DeepSeek.&lt;/p&gt;

&lt;p&gt;Important boundary first: &lt;code&gt;cc-switch&lt;/code&gt; is not a Claude jailbreak, and it is not a universal adapter for every coding agent. It mainly helps with the Claude Code provider route. Codex cannot use this path directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7to1itq7tl4nujj0eem.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr7to1itq7tl4nujj0eem.png" alt="Claude Code + DeepSeek tested" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What cc-switch actually solves
&lt;/h2&gt;

&lt;p&gt;It reduces manual config drift.&lt;/p&gt;

&lt;p&gt;Instead of hand-editing model name, base URL, and API key every time, you keep them as named providers and switch between them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizcypr6v9awh3k0pg4du.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizcypr6v9awh3k0pg4du.png" alt="Install cc-switch" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The package I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;npm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nx"&gt;adithya-13/cc-switch&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Windows traps
&lt;/h2&gt;

&lt;p&gt;Two details mattered in my test.&lt;/p&gt;

&lt;p&gt;First, PowerShell may block the npm-generated &lt;code&gt;.ps1&lt;/code&gt; shim. When that happens, try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cc-switch&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, do not save the provider JSON config with a BOM. I hit a JSON parsing failure that disappeared after saving the config as UTF-8 without BOM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdjydnarz7mtqalajqlz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdjydnarz7mtqalajqlz.png" alt="Windows traps" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification
&lt;/h2&gt;

&lt;p&gt;I would not call the route ready until:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Active: DeepSeek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is visible, and the doctor check passes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyjrm53g242zohz6ts3b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyjrm53g242zohz6ts3b.png" alt="Active DeepSeek verification" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Only then did I restart Claude Code and test a small task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;This path is useful if you already use Claude Code and want a cleaner DeepSeek provider setup on Windows.&lt;/p&gt;

&lt;p&gt;It is not a general solution for every agent. The practical value is narrower: make the provider state visible, avoid hand-edited config drift, and keep Windows shell/encoding issues from masquerading as model failures.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>github</category>
    </item>
    <item>
      <title>Windows agents keep freezing: lessons from an OpenClaw merge and a Hermes maintainer reply</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Wed, 13 May 2026 14:11:49 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/windows-agents-keep-freezing-lessons-from-an-openclaw-merge-and-a-hermes-maintainer-reply-5hm4</link>
      <guid>https://dev.to/member_0af6418a/windows-agents-keep-freezing-lessons-from-an-openclaw-merge-and-a-hermes-maintainer-reply-5hm4</guid>
      <description>&lt;p&gt;Windows is a hard place to run long-lived AI coding agents.&lt;/p&gt;

&lt;p&gt;The failure mode is often quiet. A terminal window may still be open. A process may still exist. But halfway through a task, the gateway stops responding, memory search breaks, or a background service silently exits.&lt;/p&gt;

&lt;p&gt;This post summarizes two concrete trails from recent work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw merged a fix for a transient Windows file-lock problem in memory index swaps.&lt;/li&gt;
&lt;li&gt;Hermes did not merge our Windows gateway helper PR, but a maintainer clarified how this work fits into a broader Windows support plan.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  OpenClaw: transient file locks during memory index swaps
&lt;/h2&gt;

&lt;p&gt;PR:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/pull/76024" rel="noopener noreferrer"&gt;https://github.com/openclaw/openclaw/pull/76024&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During memory reindexing, OpenClaw swaps SQLite index files. On Windows, &lt;code&gt;fs.rename&lt;/code&gt; can fail when a file is briefly held by the system, antivirus software, an indexer, or another process.&lt;/p&gt;

&lt;p&gt;The errors can look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EBUSY
EPERM
EACCES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the user side, the symptom may be vague: memory search fails, or an agent task gets stuck.&lt;/p&gt;

&lt;p&gt;The merged fix is intentionally narrow. It adds bounded retries around the atomic index swap path. It does not rewrite the memory system or turn every failure into a retry.&lt;/p&gt;

&lt;p&gt;That is the kind of stability work that tends to matter in real agent use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleanup paths matter too
&lt;/h2&gt;

&lt;p&gt;Related OpenClaw trail:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/pull/59137" rel="noopener noreferrer"&gt;https://github.com/openclaw/openclaw/pull/59137&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This was not our original PR. We contributed a focused follow-up around cleanup ordering: close the temporary database before trying to remove temporary SQLite files.&lt;/p&gt;

&lt;p&gt;On Windows, that detail matters. If the file handle is still open, the cleanup path can fail even if the main logic is correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes: gateway lifecycle on Windows
&lt;/h2&gt;

&lt;p&gt;PR:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/pull/15846" rel="noopener noreferrer"&gt;https://github.com/NousResearch/hermes-agent/pull/15846&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Hermes proposal focused on a safer Windows gateway lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;start through a user-level Scheduled Task;&lt;/li&gt;
&lt;li&gt;avoid relying on a visible PowerShell or CMD window;&lt;/li&gt;
&lt;li&gt;track runtime state;&lt;/li&gt;
&lt;li&gt;keep logs;&lt;/li&gt;
&lt;li&gt;provide best-effort restart behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The PR was closed and not merged.&lt;/p&gt;

&lt;p&gt;The maintainer response was still useful: Hermes needs a consolidated Windows design rather than a set of piecemeal native-Windows PRs. The work was catalogued into the internal Windows support plan and may inform a later consolidated PR.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson: do not trust the window
&lt;/h2&gt;

&lt;p&gt;A visible terminal window is not a health check.&lt;/p&gt;

&lt;p&gt;For Windows agents, the boring pieces matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;background startup;&lt;/li&gt;
&lt;li&gt;health checks;&lt;/li&gt;
&lt;li&gt;logs;&lt;/li&gt;
&lt;li&gt;bounded retries;&lt;/li&gt;
&lt;li&gt;rollback paths;&lt;/li&gt;
&lt;li&gt;small upgrade tests before using a new version for real work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI-agent demos usually focus on model behavior. Real usage eventually runs into process lifecycle, file locks, environment inheritance, and recovery.&lt;/p&gt;

&lt;p&gt;Those are not side issues. They decide whether the agent can keep working.&lt;/p&gt;

&lt;p&gt;The canonical version with the image cards and evidence trail is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kunpeng-ai.com/en/blog/openclaw-hermes-windows-agent-stability-evidence-trail/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/openclaw-hermes-windows-agent-stability-evidence-trail/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>github</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>DeepSeek TUI on Windows: A Practical Look at a Terminal-Native Coding Agent</title>
      <dc:creator>member_0af6418a</dc:creator>
      <pubDate>Mon, 11 May 2026 03:35:30 +0000</pubDate>
      <link>https://dev.to/member_0af6418a/deepseek-tui-on-windows-a-practical-look-at-a-terminal-native-coding-agent-3odj</link>
      <guid>https://dev.to/member_0af6418a/deepseek-tui-on-windows-a-practical-look-at-a-terminal-native-coding-agent-3odj</guid>
      <description>&lt;p&gt;DeepSeek TUI is an open-source terminal-native coding agent for DeepSeek models.&lt;/p&gt;

&lt;p&gt;Project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Hmbown/DeepSeek-TUI" rel="noopener noreferrer"&gt;https://github.com/Hmbown/DeepSeek-TUI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At first glance, it is tempting to describe it as "DeepSeek's Claude Code-like tool." That comparison is useful, but only up to a point.&lt;/p&gt;

&lt;p&gt;The more interesting point is this: DeepSeek TUI is not just a terminal chat interface. It is trying to bring the model closer to the actual engineering workspace, where files, shell commands, Git diffs, diagnostics, tool calls, and recovery workflows all matter.&lt;/p&gt;

&lt;p&gt;I tested it on Windows and ran into one practical issue: the tool installed correctly, but the traditional PowerShell window flickered when launching the TUI. Switching to Windows Terminal fixed the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Short Version
&lt;/h2&gt;

&lt;p&gt;DeepSeek TUI is worth watching because it combines several capabilities that a serious coding agent needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file reading, search, and editing;&lt;/li&gt;
&lt;li&gt;shell command execution;&lt;/li&gt;
&lt;li&gt;Git context and diffs;&lt;/li&gt;
&lt;li&gt;MCP integration;&lt;/li&gt;
&lt;li&gt;LSP diagnostics;&lt;/li&gt;
&lt;li&gt;session resume;&lt;/li&gt;
&lt;li&gt;workspace snapshots and rollback;&lt;/li&gt;
&lt;li&gt;sub-agent workflows;&lt;/li&gt;
&lt;li&gt;token, cache, and cost visibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes it closer to an engineering tool than a simple Q&amp;amp;A interface.&lt;/p&gt;

&lt;p&gt;The Windows caveat is also straightforward: if the TUI flickers or fails to render correctly in a legacy console, try Windows Terminal before assuming the install or API key is broken.&lt;/p&gt;

&lt;h2&gt;
  
  
  What DeepSeek TUI Is
&lt;/h2&gt;

&lt;p&gt;The official quickstart is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; deepseek-tui
deepseek &lt;span class="nt"&gt;--version&lt;/span&gt;
deepseek &lt;span class="nt"&gt;--model&lt;/span&gt; auto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On first launch, DeepSeek TUI prompts for a DeepSeek API key. You can also configure it ahead of time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;deepseek auth &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; deepseek
deepseek auth status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project also documents other installation paths, including Scoop on Windows, Cargo, GitHub releases, and Docker images.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Windows Test: Installed, Then Flickered
&lt;/h2&gt;

&lt;p&gt;The installation itself was uneventful:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the package globally.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;deepseek&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Configure the API key.&lt;/li&gt;
&lt;li&gt;Launch the TUI again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem appeared after that. In the traditional PowerShell window, the interface kept flickering and did not enter a stable usable state.&lt;/p&gt;

&lt;p&gt;This is the kind of issue that is easy to misdiagnose. The first instinct is to reinstall the package, rotate the API key, or assume the npm package is broken.&lt;/p&gt;

&lt;p&gt;In this case, the more likely cause was terminal rendering compatibility.&lt;/p&gt;

&lt;p&gt;Modern TUI tools depend on terminal behavior such as ANSI control sequences, cursor refresh, keyboard events, pane rendering, clipboard handling, and sometimes mouse interaction. Legacy console environments can be less reliable here than Windows Terminal.&lt;/p&gt;

&lt;p&gt;After switching to Windows Terminal, DeepSeek TUI launched normally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Category Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It moves the model into the workspace
&lt;/h3&gt;

&lt;p&gt;In a web chat workflow, the model is far away from the project.&lt;/p&gt;

&lt;p&gt;You copy code into the chat. You paste errors back. You run commands manually. You summarize diffs. You decide which files matter.&lt;/p&gt;

&lt;p&gt;A terminal-native coding agent changes that boundary. It can inspect the workspace, read files, run commands, review diffs, and continue from real project state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code generation is not enough
&lt;/h3&gt;

&lt;p&gt;A coding agent should not only write code. It should help answer operational engineering questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which files are involved?&lt;/li&gt;
&lt;li&gt;What changed?&lt;/li&gt;
&lt;li&gt;Did tests or checks run?&lt;/li&gt;
&lt;li&gt;What does the Git diff show?&lt;/li&gt;
&lt;li&gt;Can the workspace be recovered if the change is wrong?&lt;/li&gt;
&lt;li&gt;Are diagnostics fed back into the next repair step?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DeepSeek TUI's file operations, shell tools, Git context, session recovery, workspace snapshots, and LSP diagnostics all point in that direction.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP expands the tool boundary
&lt;/h3&gt;

&lt;p&gt;DeepSeek TUI supports MCP. Its documentation describes both directions: it can load MCP servers from &lt;code&gt;~/.deepseek/mcp.json&lt;/code&gt;, and it can also run as an MCP server.&lt;/p&gt;

&lt;p&gt;That matters because real engineering work is not limited to local files. Teams often need databases, browsers, internal docs, issue trackers, deployment systems, or private utilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  LSP diagnostics help close the loop
&lt;/h3&gt;

&lt;p&gt;Generating code is only the first step.&lt;/p&gt;

&lt;p&gt;A developer still needs type errors, lint results, compiler output, and test failures to flow back into the next edit.&lt;/p&gt;

&lt;p&gt;DeepSeek TUI's LSP diagnostic support is important because it helps the agent enter a repair loop: edit, inspect diagnostics, fix, and verify again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Windows Recommendations
&lt;/h2&gt;

&lt;p&gt;If you are testing DeepSeek TUI on Windows, I would start with this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Node.js.&lt;/li&gt;
&lt;li&gt;Install Windows Terminal.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npm install -g deepseek-tui&lt;/code&gt; inside Windows Terminal.&lt;/li&gt;
&lt;li&gt;Check the install with &lt;code&gt;deepseek --version&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Launch with &lt;code&gt;deepseek --model auto&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Configure the API key when prompted.&lt;/li&gt;
&lt;li&gt;If the interface flickers, switch terminals before reinstalling.&lt;/li&gt;
&lt;li&gt;Start in a disposable test project.&lt;/li&gt;
&lt;li&gt;Review Git diff and command output after each task.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;DeepSeek TUI is not just a chat wrapper. It is an open-source attempt to make DeepSeek useful inside a terminal-native engineering workflow.&lt;/p&gt;

&lt;p&gt;Its combination of files, shell, Git, MCP, LSP diagnostics, session recovery, snapshots, sub-agents, and operating modes gives it the shape of a real coding agent.&lt;/p&gt;

&lt;p&gt;The project is still moving quickly, so the experience will vary by platform and terminal. My Windows issue was real, but not severe: Windows Terminal solved it.&lt;/p&gt;

&lt;p&gt;For developers watching the open-source coding-agent space, DeepSeek TUI is worth testing.&lt;/p&gt;

&lt;p&gt;Original version:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kunpeng-ai.com/en/blog/deepseek-tui-windows-terminal-coding-agent/" rel="noopener noreferrer"&gt;https://kunpeng-ai.com/en/blog/deepseek-tui-windows-terminal-coding-agent/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Hmbown/DeepSeek-TUI" rel="noopener noreferrer"&gt;https://github.com/Hmbown/DeepSeek-TUI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Related workflow thinking:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kunpeng-ai-lab/agent-collaboration-sop" rel="noopener noreferrer"&gt;https://github.com/kunpeng-ai-lab/agent-collaboration-sop&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>programming</category>
      <category>github</category>
    </item>
  </channel>
</rss>
