DEV Community: member_0af6418a

Don't Use Doubao Only as a Chatbot: 6 Practical AI Workflows for Everyday Users

member_0af6418a — Tue, 16 Jun 2026 15:59:15 +0000

Most people start using AI by chatting with it.

That is a reasonable first step. You ask it to rewrite a sentence, draft a short caption, or explain a concept. The result is immediate, so the tool feels useful.

But if AI stays inside the chat box, it quickly becomes a more fluent search box. Every session starts from zero. Every useful answer disappears into the conversation history. The workflow never compounds.

This article is not a feature review of Doubao. The more practical question is:

How can everyday users connect Doubao to real tasks while still keeping judgment, evidence, and risk boundaries in the right place?

Here are six workflows that are useful in daily work and life.

1. When You Do Not Know How To Ask, Make It Ask You First

Many weak AI answers start with a weak prompt.

For example:

Help me organize these materials.

This may work, but Doubao does not know who the materials are for, what the output should look like, what constraints matter, or what decision you need to make afterward.

A better prompt is:

I have a pile of materials, but I do not know how to organize them.
Ask me 5 questions first, so you can clarify what I need.
Then turn the result into a report outline I can reuse.

The point is not to force AI to answer immediately. The point is to let it help you clarify the task before it generates anything.

2. Give It A Real Scenario, Not Just A Generic Question

Everyday users often treat AI like search.

For example:

What should I pay attention to in a rental contract?

That question can produce a generic checklist, but it does not know what kind of contract you have, what you are worried about, or whether your key concern is deposit, early termination, penalty fees, subletting, or automatic renewal.

If you have a real situation, say so:

I am reviewing a rental contract.
My main concerns are deposit, early termination, penalty fees, and automatic renewal.
First give me a checklist.
If I paste contract text later, answer only based on what I provide.

Two things matter here:

Describe the actual situation.
Set the answer boundary early.

Do not let AI invent clauses it has not seen, and do not let a risk scan become a legal conclusion.

3. For Images And Screenshots, Separate What It Sees From What It Infers

Tools like Doubao can read manuals, bills, menus, forms, screenshots, and notifications.

That is useful, but there is a risk: AI can sometimes present an inference as if it were a fact.

So add this sentence:

First tell me what you can clearly see in the image.
Then tell me what you are inferring.
If something is unclear, say it is unclear. Do not guess.

This forces the answer into two layers: observed information and inferred interpretation.

That distinction helps you avoid being guided by an answer that sounds fluent but may not be fully grounded in the image.

4. Use Contract Review Only As Risk Triage

When people hear "AI can read contracts," they often ask:

Can I sign this contract?

That is the wrong role for AI.

Contracts involve legal risk. AI should not make the final decision for you. A safer workflow is to use it for triage: identify suspicious areas, group them, and prepare questions for a professional or the other party.

You can ask:

Based only on the contract content I provide,
help me identify clauses that may need attention.

Group them by:
1. Fees and penalties
2. Refund, termination, or cancellation terms
3. Automatic renewal or extension
4. Unilateral change clauses
5. Liability limits or disclaimers
6. Questions I should further confirm

If the clause is incomplete or unclear, say you are unsure.
Do not make the final decision for me.

This helps you notice what you might miss. It does not replace a lawyer, a professional advisor, or your own responsibility.

5. Do Not Ask Only One AI For Important Questions

The most dangerous AI mistake is not always an obvious failure. Sometimes the answer is wrong, but sounds very confident.

For important questions, do not ask only one model.

You can ask Doubao first, then send the answer to another AI system and ask it to challenge the response.

Example:

This is the answer Doubao gave me.
Please review it from the opposite side:
Where is it not rigorous?
Where is evidence missing?
Where is the wording too absolute?
Is there a more cautious version?

The goal is not to make several AI systems argue. The goal is to expose blind spots.

A second model can often notice missing evidence, overconfident wording, or assumptions that the first answer glossed over.

6. Turn Every Useful Session Into A Reusable Experience Card

Most people finish an AI conversation and move on.

That wastes a lot of value.

If a conversation solved a real problem, ask Doubao to turn it into a reusable card:

Turn this problem-solving process into an experience card I can reuse next time.

Include:
1. The problem scenario
2. The key information I provided
3. How you broke down the problem
4. How I should ask next time
5. A reusable prompt template

Without this step, every AI session starts over. With it, each useful conversation becomes a template for the next one.

Boundaries Matter

The better AI gets, the less you should treat it as authority.

I recommend a few fixed rules:

For factual questions, ask for source, date, and origin.
For image recognition, separate visible information from inference.
For files and contracts, require answers based only on provided content.
For medical, legal, financial, signing, job-changing, or high-impact decisions, do not let AI decide for you.
If it cannot provide a source, treat the answer as a lead, not a conclusion.

AI can help you ask better questions, identify risks, prepare checklists, and compare answers. Final judgment still belongs to evidence, qualified professionals, and your own responsibility.

Conclusion

Everyday users do not need to start with complex agent systems.

If Doubao is already on your phone, start with one real problem: turn messy material into a checklist, explain a screenshot, triage a contract, ask another AI to challenge an answer, and then turn the process into a reusable prompt.

That is more valuable than casual chatting.

The point is not how many times you ask AI something. The point is whether AI has entered the problems you actually need to solve.

Claude Fable 5 Field Test: Verify AI News Before You React

member_0af6418a — Sat, 13 Jun 2026 04:20:07 +0000

Claude Fable 5 is easy to turn into a familiar headline: the strongest AI model has arrived, and ordinary people are about to lose more work to AI.

That is not the angle I want to take here.

After the announcement, I read Anthropic's official launch post and model documentation, then ran a small set of hands-on tests in Claude. My conclusion is not that Fable 5 is unimportant. It is also not that it is already reliable enough to trust blindly.

The more useful takeaway is this: Fable 5 is worth watching, but the practical skill ordinary users need is the ability to verify AI news before reacting to it.

This article walks through four questions:

What are Claude Fable 5 and Claude Mythos 5?
What actually changed in this release?
How should we read benchmark claims without overreading them?
How can everyday users verify similar AI news in a simple way?

First: do not call this a full "Claude 5" release

The first thing to get right is the naming.

This is not simply "Claude 5 is fully released."

A more precise description is:

Claude Fable 5 is the widely available model for ordinary users and developers.
Claude Mythos 5 is an invitation-only preview connected to Project Glasswing and trusted partners. It is not broadly available to every user.

Anthropic's model documentation lists the corresponding API IDs: claude-fable-5 and claude-mythos-5. It also lists a 1M-token context window and up to 128k output tokens.

That matters because this is not just about smoother chat. The model can take in more material and produce longer, more complete code, reports, and analysis.

But long context and long output are not the same as guaranteed correctness. They give the model more room to work. The result still needs human review.

The real shift: AI is moving from chat toward project execution

The most important shift I see in Fable 5 is not that it can write a nicer paragraph. It is that it feels closer to a model that can carry a long task forward.

The recurring themes in the official material and external write-ups are long-horizon work, engineering tasks, complex documents, table analysis, and iterative correction.

Anthropic's launch material includes an engineering migration example involving a large Ruby codebase. Ethan Mollick's field report also describes a model that can take a vague goal, do research, write code, test, and revise. His important caveat is that the output is still imperfect and needs expert review.

That is why I do not read this release as "another chatbot upgrade."

The more useful framing is:

AI is moving from "help me write one thing" toward "help me move a project forward."

For ordinary users, this does not mean instant replacement. It means your role changes. Instead of asking the tool one sentence at a time, you increasingly need to define the goal, constraints, and acceptance criteria, then inspect whether the work is actually correct.

Benchmarks matter, but one table is not the whole story

Some of the numbers in Anthropic's benchmark table are strong.

For example, the official table reports:

SWE-Bench Pro: Fable 5 at 80.3%, GPT 5.5 at 58.6%.
FrontierCode Diamond: Fable 5 at 29.3%, GPT 5.5 at 5.7%.
Terminal-Bench 2: Fable 5 at 88.0%, GPT 5.5 + Codex CLI at 83.4%.

These numbers are meaningful signals, especially for engineering and long-horizon tasks. But they should not be converted into a universal claim that Fable 5 beats every other model in every situation.

Benchmark scope, tools, versioning, and environment all matter.

For example, the independent terminal-bench@2.1 leaderboard lists Codex CLI + GPT-5.5 at 83.4% +/- 2.2, Claude Code + Claude Opus 4.8 at 78.9% +/- 2.5, and Gemini CLI + Gemini 3.1 Pro at 70.7% +/- 2.9. That independent leaderboard does not currently list Fable 5 directly, so it should not be merged with Anthropic's official table as if they were the same measurement.

My read is simple: Fable 5 looks very strong, especially for long tasks, coding, and complex information work. But whenever an AI news item is built around a benchmark screenshot, I want to ask three questions:

Is this from the vendor, a third party, or a user test?
Are the compared systems running under the same conditions?
Does this benchmark match the task I actually need to do?

My field test: strong, but early use can still be uneven

I did not want to stop at the benchmark table, so I ran a few small tests.

First, I checked basic availability. I had Fable 5 selected, sent another task, and got Model isn't available. That is a practical issue ordinary users may hit when a new model has just launched.

Second, I continued with Chinese-language tasks. At one point the model returned Japanese content instead of Chinese. I then added a stricter instruction: use simplified Chinese only, and keep each sentence short. After that, I asked it for a one-sentence summary, a video opening, and title options. Those three follow-up tasks returned to Chinese.

These two observations do not mean Fable 5 is weak. A more proportional conclusion is that early use can be uneven. A single success or failure should not become the whole verdict.

Third, I uploaded the official benchmark table and asked the model to turn it into a 30-second Chinese voiceover for ordinary viewers. I also asked it to mark which conclusions should not be overread. This worked reasonably well: it extracted the main points and warned that different leaderboards should not be compared too casually.

Fourth, I gave it a video topic, screenshots, and risk constraints. This was closer to a real workflow test. It produced a structure, listed facts to verify, and separated out claims that could be overstated.

This is where Fable 5 started to feel less like a chat model and more like a working assistant. It could split a messy task into structure, facts, risks, and next steps.

But that is still not the same as automatic correctness. The structure needs review. The facts need checking. The final output still has to fit the real scenario.

One more issue: model restrictions should be visible to users

There was also an important policy controversy around this release.

Simon Willison wrote about a restriction mechanism related to some frontier model-development requests that was not always visible to users. Engadget later reported that Anthropic adjusted the policy after pushback from the research community, moving toward making those safeguards visible.

For ordinary users, the lesson is not just about this specific policy. It is that stronger models come with more product-level routing, fallback behavior, and safety restrictions. What you see in the answer may reflect not only model capability, but also product design and policy decisions.

So instead of only asking "is this model strong?", it is worth asking:

In which scenarios is it strong?
Which tasks trigger restrictions or fallbacks?
Can the user see when that happens?
What human checks are still required before trusting the result?

A simple three-step method for reading AI news

If you are an everyday user trying to keep up with AI, I would avoid reacting immediately to words like "strongest," "revolutionary," or "everyone will be replaced."

Use a simple three-step check instead.

First, check the official source.

Read the launch post, model documentation, pricing page, or API docs. Official material is not the full truth, but it anchors the basics: model name, access scope, parameters, limitations, and intended use cases.

Second, look for real tests.

A useful test is not just a riddle or a screenshot of a perfect answer. Put the model into a real task: read a table, modify code, draft a plan, analyze a file, or handle a small workflow. Pay attention to failures as much as successes.

Third, test your own scenario.

Do not ask whether the model is "the strongest." Ask whether it helps with one task you actually have: summarize meeting notes, review a contract for risky clauses, design a study plan, analyze a spreadsheet, prototype code, or plan content.

If it reliably improves your own workflow, that is practical value. If it only looks impressive in a news post, you do not need to panic.

My takeaway

Claude Fable 5 is worth paying attention to.

The direction is clear: AI is moving from chat toward project execution. Longer context, longer output, stronger engineering performance, and better complex-document handling all push the user role from direct operator toward goal-setter and reviewer.

But that does not mean ordinary users should let anxiety drive their decisions.

The useful habit is to treat AI news as a learning entry point, not an emotional trigger. Check the source, inspect real tests, and try the model in your own scenario. The earlier you build that verification habit, the less likely you are to be dragged around by every new model launch.

That is the real reason I ran this field test: not to declare one model as the permanent winner, but to build a more rational way to analyze AI news.

Sources

Anthropic launch announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5
Anthropic model documentation: https://docs.anthropic.com/en/docs/about-claude/models/overview
Ethan Mollick field report: https://www.oneusefulthing.org/p/what-it-feels-like-to-work-with-mythos
Simon Willison's post: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/
Engadget report: https://www.engadget.com/2192004/anthropic-walks-back-policy-sabotaging-research/

Feishu CLI Hands-on: Letting Codex Enter a Real Office Workflow

member_0af6418a — Tue, 02 Jun 2026 06:55:46 +0000

Feishu now has an official CLI, and I wanted to test a practical question:

Can an AI agent use it to enter a real office workflow?

I did not start by manually reading the documentation and turning it into a scripted demo. Instead, I gave the official larksuite/cli repository to Codex and asked it to figure out what the tool could do, install it, go through the configuration path, wait for human authorization, and then send me a message through Feishu.

After that, I turned the test into a small recurring workflow: a daily reminder to check SEO and GEO status for our main blog.

Why this is more than a command-line tool

Most CLI tools are useful because they turn repeated clicks into commands.

Feishu CLI is more interesting because the official README explicitly treats humans and AI agents as users of the tool. In the README I checked on June 1, 2026, the project describes support for messaging, docs, Bitable, spreadsheets, slides, calendar, mail, tasks, meetings, Markdown, and more.

It also describes 200+ commands and 26 AI Agent Skills.

That matters because agent-based office automation often gets stuck at the same point:

the agent can understand the task;
it can generate a plan or script;
but it has no stable, authorized way to operate the office system.

When office actions become available through a CLI, the agent can move from "understanding" to "executing a bounded action."

This does not mean full autonomous office work. It means repeated, low-risk, verifiable actions can start becoming workflows.

Letting Codex run the setup path

The first step was to let Codex read the official repository.

It found the larksuite/cli GitHub repo, read the Chinese README, and noticed that the documentation includes a quick-start path specifically for AI agents.

The flow looked like this:

Codex confirmed the install command from the official README.
It checked the local Node / npm environment.
It ran the CLI install command.
It entered the configuration flow.
It returned authorization links to me.
I completed authorization in the browser.
Codex resumed and checked the authorization state.

The important part is the responsibility split.

The agent can read documentation, run commands, parse output, and prepare the next step. But authorization should stay human-controlled. Once an agent can operate an office system, permissions become a product and security question, not just a convenience feature.

The first useful loop: a Feishu message comes back

After installation and authorization, I tested the smallest complete loop:

human intent
-> agent understands the task
-> agent calls the CLI
-> CLI operates Feishu
-> Feishu message returns to the human

That is small, but it is enough to prove the basic workflow path.

At that point, the question changes from "can this send a message?" to "what repeated office action should be turned into a workflow?"

I chose a daily SEO / GEO reminder.

The reminder is not complex. It asks me to check:

Google / Bing index status
search query and click changes
whether AI search or large models mention the brand
Chinese and English article titles, summaries, and links
whether recent content distribution created new entry points

This is exactly the kind of work that is important but easy to forget. A stable private reminder is more useful than a flashy automation that is too risky to run every day.

Start with small private tasks

The official README also includes a safety warning around AI-agent automation: hallucination, uncontrolled execution, and prompt injection are real risks when an agent operates an office platform under a user's authorization.

That should shape the first workflows.

My preferred starting point:

private reminders, not group-wide bots;
personal todos, not cross-team approvals;
fixed checklists, not open-ended execution;
read-only checks before write or delete actions;
no secrets, tokens, chat IDs, or open IDs in public screenshots, articles, or logs.

Agent office automation should not begin with broad permissions. It should begin with low-risk, high-repeat, verifiable actions.

What this means for practical users

Many people still use AI mostly for Q&A, writing, or generating spreadsheet formulas.

Those are useful, but the bigger shift happens when agents can enter real workflows.

Feishu CLI is a good example. Once an office platform has a standardized command interface, an agent can help with:

daily metric reminders;
meeting follow-up summaries;
document summaries;
calendar conflict checks;
repeated spreadsheet updates;
fixed operational checklists.

None of these tasks are dramatic. But they are repeated, easy to forget, and valuable when they happen consistently.

The value of Feishu CLI is not that a command line can replace the Feishu client. Its value is that it gives agents an office-system entry point that can be installed, authorized, checked, executed, and interrupted by a human when needed.

Full write-up:

https://kunpeng-ai.com/en/blog/feishu-cli-ai-agent-workflow/

Opus 4.8, Qwen, DeepSeek, and a Claude Code Failure: What I Could Actually Reproduce

member_0af6418a — Sun, 31 May 2026 06:34:03 +0000

There is a claim going around that Claude's latest Opus 4.8 may have been distilled from Qwen or DeepSeek.

That kind of claim spreads quickly, especially when it can be turned into a screenshot or a short clip. I wanted to test the small version of the claim first: if I ask Opus 4.8 what model it is, does it identify itself as Qwen or DeepSeek?

In my May 30 test, I could not reproduce that behavior.

But the more useful part of the test happened before the model test even started. Claude Code broke after an upgrade with spawn EBUSY, and Codex helped diagnose and fix the local Claude Code state.

The first failure was not the model

I originally planned to open Claude Code, switch to the new Opus 4.8 path, and ask a direct identity question.

Instead, Claude Code failed after the upgrade with:

spawn EBUSY

This is the kind of problem that is easy to misread. When an AI coding tool fails to start, it is tempting to blame the account, the network, the subscription, or the remote model service.

Codex pointed in a more local direction: the Claude Code component state looked broken.

The useful clues were:

an old session file parsing problem
a Claude Code executable that appeared to be half-downloaded, locked, or otherwise incomplete

After cleaning up the local component state, Claude Code ran again.

This is a very normal kind of AI tooling failure. The demo version of AI coding looks smooth. The real version often includes local caches, CLI updates, broken sessions, locked binaries, and confusing error messages.

If the toolchain is broken, the model has not really been tested yet.

Then I tested the identity claim

After Claude Code was working again, I asked Opus 4.8 a direct question:

What large model are you?

In this run, it identified itself as Claude Opus 4.8, developed by Anthropic, and running in the Claude Code environment.

It did not identify itself as Qwen.

It did not identify itself as DeepSeek.

The careful conclusion is:

In this test material, I did not reproduce Opus 4.8 identifying itself as Qwen or DeepSeek.

That is intentionally narrow.

It does not prove anything broad about training lineage, distillation, data contamination, or evaluation artifacts. A single self-identity answer is not a rigorous method for determining model origin.

But it does mean I would not treat the stronger viral claim as settled without more reproducible evidence.

The practical lesson: keep more than one agent

The most useful part of this test was not the model identity answer. It was the workflow lesson.

Claude Code broke. Codex helped fix Claude Code.

That suggests a practical setup for anyone using AI coding tools seriously: keep more than one agent installed.

For example:

If Claude Code fails, ask Codex to inspect logs and local state.
If Codex hits a confusing error, ask Claude to analyze the message.
If one toolchain is stuck, use the other agent to preserve diagnostic momentum.

This is not about declaring one tool better than another. It is about avoiding a single point of failure in your AI workflow.

Do not outsource verification to the timeline

The second lesson is about model rumors.

Claims like "this model is distilled from that model" or "this model is just a wrapper" are easy to share. They may be worth investigating, but they should not be accepted from a screenshot alone.

A better habit is to record:

date and version
local environment
exact prompt
model route or tool context
screenshots or logs
the actual output

Then the discussion can move from reaction to evidence.

That is the direction I want more model tests to take: less team-picking, more reproducible traces.

Full write-up:

https://kunpeng-ai.com/en/blog/opus-48-qwen-deepseek-claude-code-codex-test/

CodeWhale accepted our PRs: better coding agents need better harnesses

member_0af6418a — Fri, 29 May 2026 10:51:41 +0000

DeepSeek-TUI has gone through an important update. It now has a new name, CodeWhale, and two harness-related PRs from our work have been accepted by the maintainers.

This is not a flashy product change. It is not a new screen or a new button. A user may open the tool and not notice it immediately.

But if you have used coding agents on real projects, this kind of change matters. The hard part is not only whether the model can generate code. The agent also needs to know what it changed, why a test failed, and where it should look next.

What changed in CodeWhale

The two accepted PRs improve the harness around the agent:

PR #1971 exposes apply_patch preflight metadata, so before the agent edits files, it can see which paths the patch is expected to affect.
PR #1973 summarizes Cargo failures in tool metadata, so a long failure log can become a shorter signal the agent can reason about.

If the model is the brain, the harness is the workbench between that brain and the engineering scene. A weak workbench leaves the model guessing. A clearer workbench gives it better signals.

When people discuss AI coding tools, they often start with model capability: is the model stronger, is the context longer, can it write more code automatically?

Those questions matter. But in day-to-day engineering, another question matters just as much: does the tool turn the task scene into something the model can understand, trace, and review?

These PRs are not about writing more code

The first change is simple: before applying a patch, tell the agent which paths the patch will touch.

That sounds small, but it affects the next decision. If a patch changes a config file, a test file, and a core logic file, where should the agent inspect first after a failure? If path information is missing, the agent can easily spend time in the wrong place.

The second change is about Cargo failure logs.

Build and test logs can be long. The useful part may be buried inside dozens or hundreds of lines. A human engineer filters out noise almost automatically: error type, likely location, useful hint, next check. An agent that receives one raw blob of log text can be pulled away by noise.

The value of this change is not that the harness makes decisions for the agent. It organizes the scene so the agent can make a better next move.

Why this matters for AI replacing work

This also connects to a bigger question: what kind of work is AI actually starting to replace?

In programming, I do not think the first thing being replaced is complete engineering judgment. Not yet.

What is easier to automate first is the repeated, fragmented work around engineering judgment: collecting changed-file context, reading long logs, summarizing failure causes, and listing the next possible checks.

Those tasks are not meaningless. They take attention. But they are not the same as deciding the product goal, choosing the tradeoff, or accepting the risk.

The important point is that AI does not become useful in a vacuum. It needs an environment that provides clean signals.

If a tool throws a long log at the model and hopes the model reconstructs all the context, that is mostly a bet on guessing ability. If the tool can say what changed, where the failure is concentrated, and what evidence should guide the next step, the agent becomes more stable.

So the shift is not "programmers are immediately replaced." A more practical view is that parts of context cleanup, log triage, and first-pass failure analysis are becoming easier to automate.

What developers can take from this

For anyone using coding agents, the takeaway is direct: do not only ask whether the model is strong. Ask whether you have given it a proper harness.

A useful harness should answer questions like these:

Before the agent modifies files, can it know which files may be affected?
After a test fails, can the failure become a clean signal instead of raw noise?
Can the next fix continue from evidence instead of starting over?
Can the system mark where human judgment is still required?
After the task ends, is there a record that can be reviewed?

These questions are less exciting than "switch to a stronger model." They are also closer to real productivity.

The larger lesson

Progress in AI coding tools does not always arrive as a dramatic new feature. Sometimes it is a clearer patch-impact signal, a cleaner failure summary, or a task scene that can be reviewed later.

Those lower-level changes are what help an agent move from answering to doing.

So when we talk about what AI will replace, it helps to make the question more specific. It is not replacing complete engineering judgment all at once. It is first replacing some repeated context organization, log filtering, and first-pass debugging work.

The part that remains human is still important: goals, tradeoffs, risk control, and deciding how the tool should fit into the workflow.

Canonical version:
https://kunpeng-ai.com/en/blog/codewhale-harness-pr-merged/

PRs:

Codex + Tencent LKEAP 401: It Was Not the Key

member_0af6418a — Mon, 18 May 2026 14:00:03 +0000

I ran into a failure that looked like an authentication problem at first:

unexpected status 401 Unauthorized

The setup was Codex against Tencent Cloud LKEAP Token Plan, using an OpenAI-compatible Chat Completions endpoint.

The important clue was not the 401. It was the final URL:

https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions/responses

That URL shape is already wrong.

The config that caused it

The provider entry looked roughly like this:

[model_providers.custom]
name = "custom"
wire_api = "responses"
requires_openai_auth = true
base_url = "https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions"

The mistake is subtle.

The Tencent endpoint in this setup is Chat Completions-style:

/plan/v3/chat/completions

But newer Codex expects a Responses API-style provider when wire_api = "responses" is used. So Codex appended /responses to a base URL that already ended in /chat/completions.

That produced:

/plan/v3/chat/completions/responses

At that point, rotating the API key is unlikely to help. The request is already going to the wrong protocol path.

Why not switch Codex back to Chat Completions?

That was the next thing I checked.

The newer Codex configuration rejects wire_api = "chat":

invalid configuration: `wire_api = "chat"` is no longer supported.
How to fix: set `wire_api = "responses"` in your provider config.

So this is not just a missing slash or a bad base URL. It is a protocol mismatch:

Codex wants to speak Responses API.
The LKEAP endpoint I was using speaks Chat Completions.
"OpenAI-compatible" does not automatically mean "compatible with every OpenAI client mode."

That last point is the real lesson.

My debugging order now

For this class of issue, I would check things in this order:

Inspect the final request URL.
Confirm the client's wire protocol.
Confirm the upstream endpoint shape.
Only then spend time on keys and permissions.

If the URL contains patterns like these, stop and look at path composition first:

/chat/completions/responses
/v1/v1
/responses/chat/completions

The status code may be a symptom. The URL shape is often the evidence.

Workaround: local protocol adapter

The workaround I tested toward was a small local Node.js proxy.

Codex calls a local Responses-shaped endpoint:

http://127.0.0.1:15722/v1/responses

The local proxy converts that into a Chat Completions request and forwards it to:

https://api.lkeap.cloud.tencent.com/plan/v3/chat/completions

The Codex config then points to the local proxy:

model_provider = "tencent_lkeap_proxy"
model = "glm-5.1"
disable_response_storage = true

[model_providers.tencent_lkeap_proxy]
name = "Tencent LKEAP via local Responses proxy"
wire_api = "responses"
base_url = "http://127.0.0.1:15722/v1"
requires_openai_auth = true
env_key = "OPENAI_API_KEY"

The real Tencent key stays in the proxy process environment:

$env:TENCENT_LKEAP_API_KEY="REDACTED"
$env:TENCENT_LKEAP_MODEL="glm-5.1"
node tools\codex-responses-to-chat-proxy.mjs

Codex still expects an OpenAI auth variable for this provider shape, so I used a local dummy value:

$env:OPENAI_API_KEY="local-proxy-dummy"

The proxy ignores that dummy value and uses TENCENT_LKEAP_API_KEY upstream.

Boundary

I would not describe this as a complete drop-in replacement.

What was verified in the original debugging record:

the failure path was identified
the local proxy script was created
node --check passed

Still to verify:

full end-to-end smoke test with a real LKEAP key
streaming behavior under longer tasks
Codex tool-call traffic
cancellation, timeout, and error mapping

That distinction matters. A protocol adapter can be useful, but it should be treated as an adapter to harden, not as magic compatibility.

The practical takeaway:

When an AI coding tool fails against an "OpenAI-compatible" endpoint, do not ask only whether the key is valid. Ask whether the client and server are actually speaking the same API shape.

Original version:
https://kunpeng-ai.com/en/blog/codex-lkeap-protocol-path-debugging/

Can Claude Code Still Use DeepSeek? A Windows Test with cc-switch

member_0af6418a — Fri, 15 May 2026 13:35:06 +0000

A lot of older third-party Claude model routes have become unreliable. I tested a narrower path on Windows: Claude Code through cc-switch to DeepSeek.

Important boundary first: cc-switch is not a Claude jailbreak, and it is not a universal adapter for every coding agent. It mainly helps with the Claude Code provider route. Codex cannot use this path directly.

What cc-switch actually solves

It reduces manual config drift.

Instead of hand-editing model name, base URL, and API key every time, you keep them as named providers and switch between them.

The package I used:

npm install -g @adithya-13/cc-switch

Windows traps

Two details mattered in my test.

First, PowerShell may block the npm-generated .ps1 shim. When that happens, try:

cmd /c cc-switch

Second, do not save the provider JSON config with a BOM. I hit a JSON parsing failure that disappeared after saving the config as UTF-8 without BOM.

Verification

I would not call the route ready until:

Active: DeepSeek

is visible, and the doctor check passes.

Only then did I restart Claude Code and test a small task.

Takeaway

This path is useful if you already use Claude Code and want a cleaner DeepSeek provider setup on Windows.

It is not a general solution for every agent. The practical value is narrower: make the provider state visible, avoid hand-edited config drift, and keep Windows shell/encoding issues from masquerading as model failures.

Windows agents keep freezing: lessons from an OpenClaw merge and a Hermes maintainer reply

member_0af6418a — Wed, 13 May 2026 14:11:49 +0000

Windows is a hard place to run long-lived AI coding agents.

The failure mode is often quiet. A terminal window may still be open. A process may still exist. But halfway through a task, the gateway stops responding, memory search breaks, or a background service silently exits.

This post summarizes two concrete trails from recent work:

OpenClaw merged a fix for a transient Windows file-lock problem in memory index swaps.
Hermes did not merge our Windows gateway helper PR, but a maintainer clarified how this work fits into a broader Windows support plan.

OpenClaw: transient file locks during memory index swaps

PR:

https://github.com/openclaw/openclaw/pull/76024

During memory reindexing, OpenClaw swaps SQLite index files. On Windows, fs.rename can fail when a file is briefly held by the system, antivirus software, an indexer, or another process.

The errors can look like:

EBUSY
EPERM
EACCES

From the user side, the symptom may be vague: memory search fails, or an agent task gets stuck.

The merged fix is intentionally narrow. It adds bounded retries around the atomic index swap path. It does not rewrite the memory system or turn every failure into a retry.

That is the kind of stability work that tends to matter in real agent use.

Cleanup paths matter too

Related OpenClaw trail:

https://github.com/openclaw/openclaw/pull/59137

This was not our original PR. We contributed a focused follow-up around cleanup ordering: close the temporary database before trying to remove temporary SQLite files.

On Windows, that detail matters. If the file handle is still open, the cleanup path can fail even if the main logic is correct.

Hermes: gateway lifecycle on Windows

PR:

https://github.com/NousResearch/hermes-agent/pull/15846

The Hermes proposal focused on a safer Windows gateway lifecycle:

start through a user-level Scheduled Task;
avoid relying on a visible PowerShell or CMD window;
track runtime state;
keep logs;
provide best-effort restart behavior.

The PR was closed and not merged.

The maintainer response was still useful: Hermes needs a consolidated Windows design rather than a set of piecemeal native-Windows PRs. The work was catalogued into the internal Windows support plan and may inform a later consolidated PR.

The lesson: do not trust the window

A visible terminal window is not a health check.

For Windows agents, the boring pieces matter:

background startup;
health checks;
logs;
bounded retries;
rollback paths;
small upgrade tests before using a new version for real work.

AI-agent demos usually focus on model behavior. Real usage eventually runs into process lifecycle, file locks, environment inheritance, and recovery.

Those are not side issues. They decide whether the agent can keep working.

The canonical version with the image cards and evidence trail is here:

https://kunpeng-ai.com/en/blog/openclaw-hermes-windows-agent-stability-evidence-trail/

DeepSeek TUI on Windows: A Practical Look at a Terminal-Native Coding Agent

member_0af6418a — Mon, 11 May 2026 03:35:30 +0000

DeepSeek TUI is an open-source terminal-native coding agent for DeepSeek models.

Project:

https://github.com/Hmbown/DeepSeek-TUI

At first glance, it is tempting to describe it as "DeepSeek's Claude Code-like tool." That comparison is useful, but only up to a point.

The more interesting point is this: DeepSeek TUI is not just a terminal chat interface. It is trying to bring the model closer to the actual engineering workspace, where files, shell commands, Git diffs, diagnostics, tool calls, and recovery workflows all matter.

I tested it on Windows and ran into one practical issue: the tool installed correctly, but the traditional PowerShell window flickered when launching the TUI. Switching to Windows Terminal fixed the problem.

The Short Version

DeepSeek TUI is worth watching because it combines several capabilities that a serious coding agent needs:

file reading, search, and editing;
shell command execution;
Git context and diffs;
MCP integration;
LSP diagnostics;
session resume;
workspace snapshots and rollback;
sub-agent workflows;
token, cache, and cost visibility.

That makes it closer to an engineering tool than a simple Q&A interface.

The Windows caveat is also straightforward: if the TUI flickers or fails to render correctly in a legacy console, try Windows Terminal before assuming the install or API key is broken.

What DeepSeek TUI Is

The official quickstart is:

npm install -g deepseek-tui
deepseek --version
deepseek --model auto

On first launch, DeepSeek TUI prompts for a DeepSeek API key. You can also configure it ahead of time:

deepseek auth set --provider deepseek
deepseek auth status

The project also documents other installation paths, including Scoop on Windows, Cargo, GitHub releases, and Docker images.

My Windows Test: Installed, Then Flickered

The installation itself was uneventful:

Install the package globally.
Run deepseek.
Configure the API key.
Launch the TUI again.

The problem appeared after that. In the traditional PowerShell window, the interface kept flickering and did not enter a stable usable state.

This is the kind of issue that is easy to misdiagnose. The first instinct is to reinstall the package, rotate the API key, or assume the npm package is broken.

In this case, the more likely cause was terminal rendering compatibility.

Modern TUI tools depend on terminal behavior such as ANSI control sequences, cursor refresh, keyboard events, pane rendering, clipboard handling, and sometimes mouse interaction. Legacy console environments can be less reliable here than Windows Terminal.

After switching to Windows Terminal, DeepSeek TUI launched normally.

Why This Category Matters

It moves the model into the workspace

In a web chat workflow, the model is far away from the project.

You copy code into the chat. You paste errors back. You run commands manually. You summarize diffs. You decide which files matter.

A terminal-native coding agent changes that boundary. It can inspect the workspace, read files, run commands, review diffs, and continue from real project state.

Code generation is not enough

A coding agent should not only write code. It should help answer operational engineering questions:

Which files are involved?
What changed?
Did tests or checks run?
What does the Git diff show?
Can the workspace be recovered if the change is wrong?
Are diagnostics fed back into the next repair step?

DeepSeek TUI's file operations, shell tools, Git context, session recovery, workspace snapshots, and LSP diagnostics all point in that direction.

MCP expands the tool boundary

DeepSeek TUI supports MCP. Its documentation describes both directions: it can load MCP servers from ~/.deepseek/mcp.json, and it can also run as an MCP server.

That matters because real engineering work is not limited to local files. Teams often need databases, browsers, internal docs, issue trackers, deployment systems, or private utilities.

LSP diagnostics help close the loop

Generating code is only the first step.

A developer still needs type errors, lint results, compiler output, and test failures to flow back into the next edit.

DeepSeek TUI's LSP diagnostic support is important because it helps the agent enter a repair loop: edit, inspect diagnostics, fix, and verify again.

Practical Windows Recommendations

If you are testing DeepSeek TUI on Windows, I would start with this sequence:

Install Node.js.
Install Windows Terminal.
Run npm install -g deepseek-tui inside Windows Terminal.
Check the install with deepseek --version.
Launch with deepseek --model auto.
Configure the API key when prompted.
If the interface flickers, switch terminals before reinstalling.
Start in a disposable test project.
Review Git diff and command output after each task.

Final Take

DeepSeek TUI is not just a chat wrapper. It is an open-source attempt to make DeepSeek useful inside a terminal-native engineering workflow.

Its combination of files, shell, Git, MCP, LSP diagnostics, session recovery, snapshots, sub-agents, and operating modes gives it the shape of a real coding agent.

The project is still moving quickly, so the experience will vary by platform and terminal. My Windows issue was real, but not severe: Windows Terminal solved it.

For developers watching the open-source coding-agent space, DeepSeek TUI is worth testing.

Original version:

https://kunpeng-ai.com/en/blog/deepseek-tui-windows-terminal-coding-agent/

Project:

https://github.com/Hmbown/DeepSeek-TUI

Related workflow thinking:

https://github.com/kunpeng-ai-lab/agent-collaboration-sop